Javascript must be enabled to continue!

Social Event Classification Based on Multimodal Masked Transformer Network

The key to multimodal social event classification is to fully and accurately utilize the features of both image and text modalities. However, most existing methods have the following limitations: (1) they simply concatenate the image features and text features of the event, and (2) there is irrelevant contextual information between different modalities, which leads to mutual interference. Therefore, it is not enough to only consider the relationship between the modalities of multimodal data, but also the irrelevant contextual information (i.e., regions or words) between the modalities. To overcome these limitations, a novel social event classification method based on multimodal masked transformer network (MMTN) is proposed. A better representation of text and image is learned through an image-text encoding network. Then, the obtained image and text representations are input into the multimodal masked transformer network to fuse the multimodal information, and the relationship between the modalities of multimodal information is modeled by calculating the similarity between the multimodal information, masking the irrelevant context between the modalities. Extensive experiments on two benchmark datasets show that the proposed multimodal masked transformer network model achieves state-of-the-art performance.

Cresta Press

Chen Hong Qian Shengsheng Li Zhangming Fang Quan Xu Changsheng

Scientific Insights and Discoveries Review

2024

Title: Social Event Classification Based on Multimodal Masked Transformer Network

Description:

The key to multimodal social event classification is to fully and accurately utilize the features of both image and text modalities.

However, most existing methods have the following limitations: (1) they simply concatenate the image features and text features of the event, and (2) there is irrelevant contextual information between different modalities, which leads to mutual interference.

Therefore, it is not enough to only consider the relationship between the modalities of multimodal data, but also the irrelevant contextual information (i.

, regions or words) between the modalities.

To overcome these limitations, a novel social event classification method based on multimodal masked transformer network (MMTN) is proposed.

A better representation of text and image is learned through an image-text encoding network.

Then, the obtained image and text representations are input into the multimodal masked transformer network to fuse the multimodal information, and the relationship between the modalities of multimodal information is modeled by calculating the similarity between the multimodal information, masking the irrelevant context between the modalities.

Extensive experiments on two benchmark datasets show that the proposed multimodal masked transformer network model achieves state-of-the-art performance.

Back

Related Results

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal

Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...

LIFE CYCLE OF TRANSFORMER 110/X KV AND ITS VALUE

In a deregulated environment, power companies are in the constant process of reducing the costs of operating power facilities, with the aim of optimally improving the quality of de...

ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)

One cause the interruption of transformer is loading that exceeds the capabilities of the transformer. The state of continuous overload will affect the age of the transformer and r...

DESIGNING A MULTIMODAL TRANSPORT NETWORK

Objective: To create a methodology for designing a multimodal transport network under various scenarios of socioeconomic development of the Russian Federation and its regions which...

AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model

Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...

PLC Based Load Sharing of Transformers

The transformer is very expensive and bulky power system equipment. It runs and feed the load for 24 hours a day. Sometimes the load on the transformer unexpectedly rises above its...

Simulation modeling study on short circuit ability of distribution transformer

Abstract Under short circuit condition, the oil immersed distribution transformer will endure combined electro-thermal stress, eventually lead to the mechanical dama...

Email:
Password:

Email: