基于混合特征和链接影响力的关键词识别及语义树分析

KEYWORDS EXTRACTION AND SEMANTIC TREE RESEARCH BASED ON MIXED FEATURES AND LINK INFLUENCE IN LARGE-SCALE DATA

  • 摘要: 针对传统关键词识别方法不能有效结合词汇语义及结构信息的缺陷,提出一类基于词语语义网络与共现结构网络联合特征挖掘分析的关键词识别方法。通过结合文本的语义网络及结构网络得到兼顾词汇语义及结构的词汇影响力网络。提出链接影响力指标进行关键词识别。构建大规模英文词汇语义树,对其进行关联挖掘分析。实验表明,该方法在大规模语料下有较好的识别效果,挖掘所得语义树能够反映词汇的上下文结构关系及语义信息。

     

    Abstract: Since the traditional keyword recognition methods cannot effectively combine the semantic and structural information of words, this paper proposes a keyword recognition method, which based on joint feature mining and analysis of word semantic network and co-occurrence structural network. The vocabulary influence network combining the semantic network and the structural network of the text was obtained. Link influence index was proposed to identify keywords. A large-scale Semantic tree of English words was constructed and analyzed by association mining. The experimental results show that the proposed method has a good keyword recognition effect on large-scale corpus data, and the semantic tree obtained by mining can reflect the contextual structure relationship and semantic information of words.

     

/

返回文章
返回