Javascript must be enabled to continue!

Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks

In the context of the rapid proliferation of multimodal data (e.g. text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area. The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN). The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data. The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods. Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities. The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information. Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy. The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness. The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.

MDPI AG

Yuping Yuan Haozhong Xue

2025

Title: Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks

Description:

In the context of the rapid proliferation of multimodal data (e.

text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area.

The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN).

The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data.

The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods.

Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities.

The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information.

Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy.

The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness.

The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.

Back

Related Results

Fuzzy Chaotic Neural Networks

An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...

Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal

Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...

Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery

Abstract Accurately predicting drug sensitivity and understanding what is driving it are major challenges in drug discovery. Graphs are a natural framework for captu...

Graph-based Interactive Bibliographic Information Retrieval Systems

In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...

On the role of network dynamics for information processing in artificial and biological neural networks

Understanding how interactions in complex systems give rise to various collective behaviours has been of interest for researchers across a wide range of fields. However, despite ma...

CG-TGAN: Conditional Generative Adversarial Networks with Graph Neural Networks for Tabular Data Synthesizing

Data sharing is necessary for AI to be widely used, but sharing sensitive data with others with privacy is risky. To solve these problems, it is necessary to synthesize realistic t...

DESIGNING A MULTIMODAL TRANSPORT NETWORK

Objective: To create a methodology for designing a multimodal transport network under various scenarios of socioeconomic development of the Russian Federation and its regions which...

Domination of Polynomial with Application

In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...

Email:
Password:

Email: