Abstract:
In order to overcome the problems that some current models do not fully use the semantic information of images and do not have specific scene division, an image caption method named SW-2LSTM is proposed. A model based on the ResNet-LSTM network was constructed, and the linear layer and the BN layer were added. And image caption was processed to get corresponding tags. Image tags were extracted to generate tag vectors, which were directly applied to the weight matrix, and the original weight matrix was extended to a set of weight matrices related to tags. The weight set was decomposed by using tensor decomposition, and the bean search algorithm was added. The MS COCO data set was classified on its basic categories. Experimental results show that the model can effectively improve the quality of generating caption.