Zhang Yawei, Wang Jingjing, Li Jiaxian, Zhou Mengnan. MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATIONJ. Computer Applications and Software, 2025, 42(5): 108-115,129. DOI: 10.3969/j.issn.1000-386x.2025.05.016
Citation: Zhang Yawei, Wang Jingjing, Li Jiaxian, Zhou Mengnan. MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATIONJ. Computer Applications and Software, 2025, 42(5): 108-115,129. DOI: 10.3969/j.issn.1000-386x.2025.05.016

MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATION

  • In recent years, the research and development of deep learning for multi-mode interaction has attracted extensive attention, among which multi-mode pretraining model is indispensable. However, experiments show that most of these large models are not suitable for single-mode scenarios, and require a large number of multi-mode aligned corpora training which is difficult to obtain, and the number of parameters is too large to deploy. Therefore, this paper proposes a lightweight modal co-encoder MIBERT, which does not need to align multi-mode corpora and focuses on single-mode scenarios. In order to train MIBERT, MJ-KD was designed. The pre-training model Bertlarge and ResNet152 were used as teacher models, and their knowledge was transferred to MIBERT by MJ-KD. Experimental results show that the performance of the proposed MIBERT is equal to or better than that of the benchmark model on multiple tasks in image and text single-modality scenarios.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return