基于最优传输理论和格兰杰因果关系检验的文本分类优化

TEXT CLASSIFICATION OPTIMIZATION METHOD BASED ON OPTIMAL TRANSPORT THEORY AND GRANGER CAUSALITY TEST

  • 摘要: 随着深度学习及相关预测建模等在文本分类任务上取得较好表现,其模型泛化性要求与有限的数据规模之间的冲突愈发严峻。基于梯度下降的网络优化方法要求所做变换必须连续可微,且优化过程中容易因局部极小值。对此,提出一种基于格兰杰因果关系检验和最优传输理论的预测建模型性能优化方法。将随机化算法与数据驱动的概率分布算法结合,基于格兰杰因果关系检验在小样本数据集上生成有效特征。然后基于最优传输理论学习有效特征间结构化组合表征,进而兼容连续/非连续高维流形结构的传输映射造成的不稳定性。实验表明,对比直接使用BERT和TextGCN主流预测模型,在中英文数据集上准确率均有提升。

     

    Abstract: Deep learning and related pre-training models have achieved good performance in text classification tasks. The conflict between the generalization requirements of the model and the limited data scale is becoming more and more serious. Gradient descent is used to optimize network parameters, which requires that the network transformation must be continuously differentiable. In addition, the optimization process is easy to be trapped into local minimum values. Based on Granger causality test and optimal transport theory, a performance optimization method for deep learning pre-training models is proposed. The randomization algorithm was combined with the data-driven probability distribution algorithm to generate effective features on a small sample dataset based on the Granger causality test. Based on the optimal transport theory, the optimal combination of effective features was learned to compatible with the instability caused by the transmission mapping between continuous and non-continuous high-dimensional manifold structures. The experimental results show that compared with BERT and TextGCN, the accuracy rates on Chinese and English datasets are both improved.

     

/

返回文章
返回