Abstract:
The framework of cross-language knowledge graph is mostly made use of Wikipedia, but with few Chinese entities, it is difficult to build a large-scale cross-language knowledge graph with Chinese as the core. How to use the existing large-scale Chinese encyclopedia knowledge base such as Baidu Encyclopedia to assist the construction of cross-language knowledge map is an urgent problem to be solved. However, Wikipedia and Baidu Encyclopedia belong to different classification systems, which increases the scope and difficulty of cross-encyclopedia retrieval. On this basis, a semi-supervised Baidu Encyclopedia classification method is proposed, which integrates a small amount of Wikipedia knowledge with classification labels. The semantic features and statistical features of the encyclopedia abstract text were obtained based on the word embedding and BoW model. The two were fused as the input of the variational autoencoder to obtain the semantic representation of the encyclopedia text. A small amount of Wikipedia classification loss and a large amount of unlabeled Baidu Encyclopedia reconstruction loss were used to construct semi-supervised classification loss and realize the unification of classification system. Experimental results show that the proposed method can achieve the accurate migration from Baidu Encyclopedia to Wikipedia classification system.