Abstract:
Due to the lack of labeled samples and the poor feature representation, current approaches do not perform well in classifying security patches for open source software (OSS). To alleviate these problems, we propose a semi-supervised clustering-based security patch classification method. The features were extracted from commit messages, and encoded with contrastive learning to enhance the difference between types of samples. Based on feature similarity, semi-supervised clustering was performed to form clusters with more sufficient data distributions. The method classified samples by measuring their similarity to each cluster. Experimental results show that the proposed method can effectively classify security patches with higher accuracy than existing state-of-the-art methods.