MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATION

Zhang Yawei; Wang Jingjing; Li Jiaxian; Zhou Mengnan

doi:10.3969/j.issn.1000-386x.2025.05.016

Zhang Yawei, Wang Jingjing, Li Jiaxian, Zhou Mengnan. MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATIONJ. Computer Applications and Software, 2025, 42(5): 108-115,129. DOI: 10.3969/j.issn.1000-386x.2025.05.016

Citation:

MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATION

Abstract

Abstract

In recent years, the research and development of deep learning for multi-mode interaction has attracted extensive attention, among which multi-mode pretraining model is indispensable. However, experiments show that most of these large models are not suitable for single-mode scenarios, and require a large number of multi-mode aligned corpora training which is difficult to obtain, and the number of parameters is too large to deploy. Therefore, this paper proposes a lightweight modal co-encoder MIBERT, which does not need to align multi-mode corpora and focuses on single-mode scenarios. In order to train MIBERT, MJ-KD was designed. The pre-training model Bertlarge and ResNet152 were used as teacher models, and their knowledge was transferred to MIBERT by MJ-KD. Experimental results show that the performance of the proposed MIBERT is equal to or better than that of the benchmark model on multiple tasks in image and text single-modality scenarios.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

MODAL JOINT REPRESENTATION LEARNING BASED ON VARIATIONAL DISTILLATION

Abstract

Catalog

Export File

Citation

Format

Content