多层特征融合及可解释的中医四诊相似度计算方法

MULTI-LAYER FEATURE FUSION AND INTERPRETABLE SIMILARITY CALCULATION METHOD FOR FOUR EXAMINATIONS OF TRADITIONAL CHINESE MEDICINE

  • 摘要: 计算中医四诊文本之间的相似度,推荐与患者四诊表现相似的既有医案,可以有效辅助临床决策与专业学习。中医四诊文本缺乏临床术语标准,普遍存在措词组句的灵活性和个性化。为了在规模有限的中医语料上得到四诊有效表征,结合四诊文本特点从考虑词汇序列和弱化词汇序列两个层面对文本进行表征,使用稀疏注意力机制关注关键特征,增强模型的可解释性,后引入梯度提升树GBDT 捕捉多种有区分性的四诊特征组合以精确预测二者相似度。在中医 四诊文本数据集上进行验证,该方法均方误差和Pearsonr系数达到了82.06 和0.89。实验结果表明,该方法可以有效改善四诊文本的语义表示并消除一些无关特征的影响,以及增强对两段四诊文本组合特征的捕获,从而提升四诊文本之间相似度的预测结果。

     

    Abstract: Calculating the similarity between the texts of the four examinations of TCM, recommending existing medical records that are similar to the patient’s four examinations performance can effectively assist clinical decision- making and professional learning. The texts of the four examinations of TCM lack the standard of clinical terminology, and the flexibility and individuation of phrasing are common. In order to obtain the effective representation of the four examinations on the limited scale TCM corpus, combining the characteristics of the four examinations texts, the text is characterized from the two levels of considering the vocabulary sequence and weakening the vocabulary sequence. The sparse attention mechanism was used to focus on the key features and enhance the interpretability of the model. After that, the GBDT was introduced to capture a variety of distinctive four examinations feature combinations to accurately predict the similarity between the two. The proposed method was verified on the text data set of the four examinations texts of TCM. The mean square error and Pearsonr coefficient were 82.06 and 0.89 respectively. The experimental results show that this method can effectively improve the semantic representation of four examinations texts and eliminate the influence of some irrelevant features, and enhance the capture of the combined features of two four examinations texts, so as to improve the prediction results of the similarity between the four examinations texts.

     

/

返回文章
返回