Javascript must be enabled to continue!
Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks
View through CrossRef
In the context of the rapid proliferation of multimodal data (e.g. text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area. The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN). The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data. The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods. Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities. The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information. Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy. The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness. The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.
Title: Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks
Description:
In the context of the rapid proliferation of multimodal data (e.
g.
text, image, audio), the effective integration and retrieval of information across different modalities has emerged as a pivotal research area.
The present paper proposes a multimodal information integration and retrieval framework based on a Graph Neural Network (GNN).
The objective of this framework is to enhance the fusion effect and cross-modal retrieval performance of heterogeneous data.
The proposed model innovatively adopts a graph structure to model the complex relationship between modalities, building upon existing multimodal fusion methods.
Specifically, a hierarchical graph structure is introduced, wherein each modality functions as a node, with edges denoting dependencies between modalities and within modalities.
The graph is processed by a Graph Convolutional Network (GCN) to aggregate the features of adjacent nodes to optimize the joint representation of multimodal information.
Furthermore, a cross-modal attention mechanism is integrated to dynamically learn the relevance of different modalities under a specific query, with the aim of further improving retrieval accuracy.
The proposed framework facilitates end-to-end training, enabling efficient learning of multimodal representations and enhancement of retrieval robustness.
The experimental results demonstrate that the proposed model significantly enhances the retrieval accuracy and recall rate in comparison with existing multimodal retrieval models on the benchmark dataset.
Related Results
Graph convolutional neural networks for 3D data analysis
Graph convolutional neural networks for 3D data analysis
(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND
Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Penelitian ini bertujuan menentukan bilangan terhubung titik pelangi (rainbow vertex connection number) pada graf garis dan graf tengah yang diperoleh dari hasil operasi comb antar...
Literasi Multimodal: Teori, Desain, dan Aplikasi
Literasi Multimodal: Teori, Desain, dan Aplikasi
Buku ini bertujuan untuk pengembangan strategi dan model paket pelajaran atau mata kuliah dengan menawarkan contoh-contoh strategi instruksional yang memiliki landasan teori dan be...
Fuzzy Chaotic Neural Networks
Fuzzy Chaotic Neural Networks
An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract
Accurately predicting drug sensitivity and understanding what is driving it are major challenges in drug discovery. Graphs are a natural framework for captu...
Graph Theory Applications in Database Management
Graph Theory Applications in Database Management
Graph theory, which is a branch of discrete mathematics, has emerged as a powerful tool in various domains, including database management. This abstract investigates the ways in wh...

