针对需求缺陷检测任务的自然语言需求数据集评估

NATURAL LANGUAGE REQUIREMENT DATASET EVALUATION FOR REQUIREMENT DEFECT DETECTION TASK

  • 摘要: 自然语言是软件需求的主要书写形式之一,易于理解但容易产生缺陷。目前,基于自然语言处理等技术解决需求缺陷的方法引起学术界和工业界的广泛关注。但不像其他领域中存在大量可用公开数据集,在软件工程领域,仍然缺乏合适数据集与评价数据集是否合适的方法来帮助进行基于自然语言的需求缺陷检测等任务。针对需求缺陷检测,提出对应的数据集评估方法与度量模型,设计基于规则的数据集评估框架,对已有的公开需求数据集进行实验分析,并根据量化指标进行统计。

     

    Abstract: Natural language has been widely used as one form of software requirements as it is easy to understand. But natural language requirements are prone to defects. At present, applying natural language processing techniques on requirement defects has gradually become a research hotspot. However, unlike other fields having a large number of publicly available datasets, in the field of software engineering, there is still a lack of suitable datasets and methods to evaluate whether datasets are sufficient for helping perform tasks such as natural language defect detection. Aiming at the task of requirement defect detection, we propose an evaluation method and quantitative metric model for corresponding dataset, and designe a rule-based evaluation framework. We experimented with existing public requirement dataset, and conducted statistics based on quantitative metrics.

     

/

返回文章
返回