Abstract:
In view of the high time complexity of high-order interaction and the large size of complex problem solved by neural network in the simple factorization machines (FM), a knowledge distillation attention deep network (K-ADN) model is proposed based on the movie score prediction as an example. Combined with the attention network to distinguish the importance of interactive features, the attention value was obtained. Deep neural networks (DNN) were used to deal with the combination of high-order features, and the neural network model was established as the teacher model. Starting from the knowledge distillation technology, the teacher model was used to ensure the accuracy, and the student model was used to simplify the model size, so as to obtain more effective scoring prediction results. The experimental results based on Douban movie show that the accuracy of the model is improved, and the parameters are reduced by 86% after knowledge distillation.