Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Sparse Fusion for Multimodal Transformers

View through CrossRef
Multimodal classification is a core task in human-centric machine learning.We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy.To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost. Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling.Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks. State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements. Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches. This paves the way for enabling multimodal learning on low-resource devices.
Title: Sparse Fusion for Multimodal Transformers
Description:
Multimodal classification is a core task in human-centric machine learning.
We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy.
To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost.
Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling.
Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks.
State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements.
Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches.
This paves the way for enabling multimodal learning on low-resource devices.

Related Results

The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Development of a multimodal imaging system based on LIDAR
Development of a multimodal imaging system based on LIDAR
(English) Perception of the environment is an essential requirement for the fields of autonomous vehicles and robotics, that claim for high amounts of data to make reliable decisio...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...
Fusion rate: a time-to-event phenomenon
Fusion rate: a time-to-event phenomenon
Object.The term “fusion rate” is generally denoted in the literature as the percentage of patients with successful fusion over a specific range of follow up. Because the time to fu...
Increased Transformer Availability and Reliability
Increased Transformer Availability and Reliability
Abstract Transformers are important components of the High Voltage electrical grid and electrical power installation in industrial plants such as the petroleum indus...

Back to Top