基于语义加权的双层LSTM图像描述生成方法研究

IMAGE CAPTION GENERATION METHOD OF A TWO-LAYER LSTM BASED ON SEMANTIC WEIGHTING

摘要: 为了克服当前一些模型对图像语义信息使用不充分以及没有特定场划分景的问题，提出SW-2LSTM图像描述方法。构建基于ResNet-LSTM网络的模型，加入线性层和BN层，并预处理图像描述得到相应标签。提取图像标签生成向量直接作用于权重矩阵，将原权重矩阵扩展为一个与标签相关的权重矩阵集合，采用张量分解思想将其分解，并添加集束搜索算法。最后将MS COCO数据集在基本类别上进行场景分类。实验结果表明提出的模型可以有效地提高生成描述的质量。

Abstract: In order to overcome the problems that some current models do not fully use the semantic information of images and do not have specific scene division, an image caption method named SW-2LSTM is proposed. A model based on the ResNet-LSTM network was constructed, and the linear layer and the BN layer were added. And image caption was processed to get corresponding tags. Image tags were extracted to generate tag vectors, which were directly applied to the weight matrix, and the original weight matrix was extended to a set of weight matrices related to tags. The weight set was decomposed by using tensor decomposition, and the bean search algorithm was added. The MS COCO data set was classified on its basic categories. Experimental results show that the model can effectively improve the quality of generating caption.