A GRAPH ATTENTION NETWORK FOR VISUAL COMMON SENSE REASONING

Zhang Wenqi; Gao Yongchao; Qian Heng; Lü Hongli

doi:10.3969/j.issn.1000-386x.2025.10.026

Zhang Wenqi, Gao Yongchao, Qian Heng, Lü Hongli. A GRAPH ATTENTION NETWORK FOR VISUAL COMMON SENSE REASONINGJ. Computer Applications and Software, 2025, 42(10): 191-197,238. DOI: 10.3969/j.issn.1000-386x.2025.10.026

Citation:

A GRAPH ATTENTION NETWORK FOR VISUAL COMMON SENSE REASONING

Abstract

Abstract

Visual common sense reasoning (VCR) is a challenging multimodal task proposed in recent years. In order to reason the semantic relationship in images and improve the performance of the VCR task, a graph attention network for visual common sense reasoning is proposed. The method encoded the visual objects for various images as visual nodes in the image and used the graph attention network to model the features of visual nodes and adjacent nodes to obtain the internal associations between the objects. In addition, the method effectively captured the dynamic interaction between visual objects and further improved the understanding of image semantics. Experiments on the VCR dataset show that the performance of the method on the three sub-tasks of VCR is improved.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

A GRAPH ATTENTION NETWORK FOR VISUAL COMMON SENSE REASONING

Abstract

Catalog

Export File

Citation

Format

Content