CBI_AT:基于字符级和单词级的恶意 URL 检测

CBI_AT: MALICIOUS URL DETECTION BASED ON CHARACTERS AND WORDS LEVEL

摘要: 针对恶意 URL 的高效检测问题，目前基于黑名单的检测方法时效性差且适应性强，基于传统机器学习的检测方法效率和准确率较低。该文充分考虑 URL 的语义含义和时序特征，设计一种混合神经网络模型（CBI_AT），同时从字符级和单词级水平处理 URL，有效地捕获 URL 字符串的语义含义和时序特征，并引入多组注意力机制，抽取 URL 数据之间的关联性和依赖关系。实验结果表明，该混合神经网络模型能够高效检测恶意 URL，可达到99.86%的准确率和99.85%的F1值。

Abstract: Aimed at the problem of efficient detection of malicious URLs, the current detection methods based on blacklist are poor in timeliness and adaptability, and the methods based on traditional machine learning are low in efficiency and accuracy. This paper fully considered the semantic meaning and temporal characteristics of URL, and proposed a hybrid neural network model (CBI_AT). URL was processed from the level of character and word at the same time, for capturing the semantic meaning and temporal features of URL strings effectively. Multi-group attention mechanism was introduced to extract the correlation and dependency between URL data. The experimental results show that the hybrid neural network model can detect malicious URL efficiently, with an accuracy of 99.86% and a F1 score of 99.85%.