Javascript must be enabled to continue!
Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks
View through CrossRef
In the context of the rapid proliferation of multimodal data (e.g. text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area. The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN). The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data. The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods. Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities. The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information. Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy. The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness. The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.
Title: Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks
Description:
In the context of the rapid proliferation of multimodal data (e.
g.
text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area.
The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN).
The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data.
The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods.
Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities.
The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information.
Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy.
The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness.
The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.
Related Results
Fuzzy Chaotic Neural Networks
Fuzzy Chaotic Neural Networks
An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract
Accurately predicting drug sensitivity and understanding what is driving it are major challenges in drug discovery. Graphs are a natural framework for captu...
Graph-based Interactive Bibliographic Information Retrieval Systems
Graph-based Interactive Bibliographic Information Retrieval Systems
In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...
On the role of network dynamics for information processing in artificial and biological neural networks
On the role of network dynamics for information processing in artificial and biological neural networks
Understanding how interactions in complex systems give rise to various collective behaviours has been of interest for researchers across a wide range of fields. However, despite ma...
CG-TGAN: Conditional Generative Adversarial Networks with Graph Neural Networks for Tabular Data Synthesizing
CG-TGAN: Conditional Generative Adversarial Networks with Graph Neural Networks for Tabular Data Synthesizing
Data sharing is necessary for AI to be widely used, but sharing sensitive data with others with privacy is risky.
To solve these problems, it is necessary to synthesize realistic t...
DESIGNING A MULTIMODAL TRANSPORT NETWORK
DESIGNING A MULTIMODAL TRANSPORT NETWORK
Objective: To create a methodology for designing a multimodal transport network under various scenarios of socioeconomic development of the Russian Federation and its regions which...
Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...


