Javascript must be enabled to continue!
Sparse Fusion for Multimodal Transformers
View through CrossRef
Multimodal classification is a core task in human-centric machine learning.We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy.To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost. Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling.Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks. State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements. Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches. This paves the way for enabling multimodal learning on low-resource devices.
Center for Open Science
Title: Sparse Fusion for Multimodal Transformers
Description:
Multimodal classification is a core task in human-centric machine learning.
We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy.
To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost.
Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling.
Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks.
State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements.
Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches.
This paves the way for enabling multimodal learning on low-resource devices.
Related Results
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Development of a multimodal imaging system based on LIDAR
Development of a multimodal imaging system based on LIDAR
(English) Perception of the environment is an essential requirement for the fields of autonomous vehicles and robotics, that claim for high amounts of data to make reliable decisio...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract
The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...
Fusion rate: a time-to-event phenomenon
Fusion rate: a time-to-event phenomenon
Object.The term “fusion rate” is generally denoted in the literature as the percentage of patients with successful fusion over a specific range of follow up. Because the time to fu...
Increased Transformer Availability and Reliability
Increased Transformer Availability and Reliability
Abstract
Transformers are important components of the High Voltage electrical grid and electrical power installation in industrial plants such as the petroleum indus...

