Javascript must be enabled to continue!
Social Event Classification Based on Multimodal Masked Transformer Network
View through CrossRef
The key to multimodal social event classification is to fully and accurately utilize the features of both image and text modalities. However, most existing methods have the following limitations: (1) they simply concatenate the image features and text features of the event, and (2) there is irrelevant contextual information between different modalities, which leads to mutual interference. Therefore, it is not enough to only consider the relationship between the modalities of multimodal data, but also the irrelevant contextual information (i.e., regions or words) between the modalities. To overcome these limitations, a novel social event classification method based on multimodal masked transformer network (MMTN) is proposed. A better representation of text and image is learned through an image-text encoding network. Then, the obtained image and text representations are input into the multimodal masked transformer network to fuse the multimodal information, and the relationship between the modalities of multimodal information is modeled by calculating the similarity between the multimodal information, masking the irrelevant context between the modalities. Extensive experiments on two benchmark datasets show that the proposed multimodal masked transformer network model achieves state-of-the-art performance.
Title: Social Event Classification Based on Multimodal Masked Transformer Network
Description:
The key to multimodal social event classification is to fully and accurately utilize the features of both image and text modalities.
However, most existing methods have the following limitations: (1) they simply concatenate the image features and text features of the event, and (2) there is irrelevant contextual information between different modalities, which leads to mutual interference.
Therefore, it is not enough to only consider the relationship between the modalities of multimodal data, but also the irrelevant contextual information (i.
e.
, regions or words) between the modalities.
To overcome these limitations, a novel social event classification method based on multimodal masked transformer network (MMTN) is proposed.
A better representation of text and image is learned through an image-text encoding network.
Then, the obtained image and text representations are input into the multimodal masked transformer network to fuse the multimodal information, and the relationship between the modalities of multimodal information is modeled by calculating the similarity between the multimodal information, masking the irrelevant context between the modalities.
Extensive experiments on two benchmark datasets show that the proposed multimodal masked transformer network model achieves state-of-the-art performance.
Related Results
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
Event Management Bandung Sneaker Season
Event Management Bandung Sneaker Season
Abstract. Bandung Sneaker Season is the first sneakers and streetwear event to be held in Bandung, an annual event that was first created in 2018 by Maks.co Event Organizer. At the...
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
LIFE CYCLE OF TRANSFORMER 110/X KV AND ITS VALUE
LIFE CYCLE OF TRANSFORMER 110/X KV AND ITS VALUE
In a deregulated environment, power companies are in the constant process of reducing the costs of operating power facilities, with the aim of optimally improving the quality of de...
ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)
ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)
One cause the interruption of transformer is loading that exceeds the capabilities of the transformer. The state of continuous overload will affect the age of the transformer and r...
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAFTAR PUSTAKAAditama, M. H. R., & Selfiardy, S. (2022). Kehidupan Mahasiswa Kuliah Sambil Bekerja di Masa Pandemi Covid-19. Kidspedia: Jurnal Pendidikan Anak Usia Dini, 3(...
DESIGNING A MULTIMODAL TRANSPORT NETWORK
DESIGNING A MULTIMODAL TRANSPORT NETWORK
Objective: To create a methodology for designing a multimodal transport network under various scenarios of socioeconomic development of the Russian Federation and its regions which...

