融合维基知识的变分半监督百度百科分类

VARIATIONAL SEMISUPERVISED BAIDU ENCYCLOPEDIA CLASSIFICATION BASED ON WIKI KNOWLEDGE

  • 摘要: 跨语言知识图谱构架多利用维基百科,但其中文实体较少,构建大规模以中文为核心的跨语言知识图谱比较困难。如何利用百度百科等现有的大规模中文百科知识库来辅助构建跨语言知识图谱是亟待解决的问题,然而维基百科和百度百科属于不同的分类体系,增加了跨百科检索的范围和难度。基于此,提出一种融合少量带分类标签的维基知识指导下的半监督百度百科分类方法。基于词嵌入和词袋模型分别获得百科摘要文本的语义特征和统计特征;融合两者作为变分自编码模型的输入,获得其语义表征;利用少量维基百科分类损失和海量无标签百度百科重构损失,构造半监督分类损失,实现分类体系统一。实验结果表明,所提方法能够准确实现百度百科到维基百科分类体系的迁移。

     

    Abstract: The framework of cross-language knowledge graph is mostly made use of Wikipedia, but with few Chinese entities, it is difficult to build a large-scale cross-language knowledge graph with Chinese as the core. How to use the existing large-scale Chinese encyclopedia knowledge base such as Baidu Encyclopedia to assist the construction of cross-language knowledge map is an urgent problem to be solved. However, Wikipedia and Baidu Encyclopedia belong to different classification systems, which increases the scope and difficulty of cross-encyclopedia retrieval. On this basis, a semi-supervised Baidu Encyclopedia classification method is proposed, which integrates a small amount of Wikipedia knowledge with classification labels. The semantic features and statistical features of the encyclopedia abstract text were obtained based on the word embedding and BoW model. The two were fused as the input of the variational autoencoder to obtain the semantic representation of the encyclopedia text. A small amount of Wikipedia classification loss and a large amount of unlabeled Baidu Encyclopedia reconstruction loss were used to construct semi-supervised classification loss and realize the unification of classification system. Experimental results show that the proposed method can achieve the accurate migration from Baidu Encyclopedia to Wikipedia classification system.

     

/

返回文章
返回