一种面向快速Web漏洞扫描的网页爬取方法

A CRAWLING METHOD FOR EFFICIENT WEB VULNERABILITIES SCANNING

摘要: 随着Web应用规模的不断扩大，对网站进行漏洞扫描的时间成本也不断增加。为此，提出一种面向快速Web漏洞扫描的网页爬取方法。该方法在传统的面向Web漏洞扫描的爬虫的基础上，利用增量闭频繁项集挖掘算法对网站页面进行阶段性聚类，并基于页面聚簇和爬虫日志构建页面分类模型，以过滤由同一个服务处理程序生成的冗余页面。实验表明，该方法能有效减少漏洞扫描系统在网站目录遍历和页面聚类上消耗的时间，从而提升Web漏洞扫描的效率。

Abstract: Due to the continuous expansion of the scale of Web applications, the time cost of Web vulnerabilities scanning is increasing constantly. Therefore, this paper proposes a crawling method for efficient Web vulnerabilities scanning. Based on traditional crawler for Web vulnerabilities scanning, this method grouped the Web pages in phases by using the algorithm of incrementally frequent closed itemset mining, and built the page classification model based on page clusters and crawling record to filter the redundant pages created by the same service handler. The experiments show that the proposed method can reduce the time spent on website path traversal and page clustering, which could improve the efficiency of web vulnerabilities scanning.