基于半监督聚类的安全补丁分类方法

SECURITY PATCH CLASSIFICATION BASED ON SEMI-SUPERVISED CLUSTERING

  • 摘要: 由于有标签样本数量少和特征表示不佳等问题,现有方法在分类开源软件的安全补丁时准确率低。针对上述问题,提出基于半监督聚类的安全补丁分类方法。从补丁描述信息中提取特征,并结合对比学习方法编码特征表示以扩大各类型样本间区分度;根据特征相似性进行半监督聚类以形成数据分布更丰富的簇;通过度量样本与各簇之间的相似性来分类补丁。实验结果表明,该方法能够有效分类安全补丁,准确率优于现有最先进的补丁分类方法。

     

    Abstract: Due to the lack of labeled samples and the poor feature representation, current approaches do not perform well in classifying security patches for open source software (OSS). To alleviate these problems, we propose a semi-supervised clustering-based security patch classification method. The features were extracted from commit messages, and encoded with contrastive learning to enhance the difference between types of samples. Based on feature similarity, semi-supervised clustering was performed to form clusters with more sufficient data distributions. The method classified samples by measuring their similarity to each cluster. Experimental results show that the proposed method can effectively classify security patches with higher accuracy than existing state-of-the-art methods.

     

/

返回文章
返回