Javascript must be enabled to continue!

A robust visual question answering approach to reduce multimodal bias

Currently, many visual question answering models have bias problems. Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability. For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images. In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored. Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results. Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels. Experiments on datasets such as VQA-CP v2.0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.

Cresta Press

Zhang Fengshuo Li Yu Li Xiangqian Xu Jinan Chen Yufeng

Scientific Insights and Discoveries Review

2024

Title: A robust visual question answering approach to reduce multimodal bias

Description:

Currently, many visual question answering models have bias problems.

Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability.

For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images.

In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored.

Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results.

Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels.

Experiments on datasets such as VQA-CP v2.

0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.

Back

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal

Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...

MULTIMODAL STRATEGIES OF MODERATE ISLAMIC DIGITAL DA’WAH ON THE RUKUN INDONESIA YOUTUBE CHANNEL

This study examines how the Rukun Indonesia YouTube channel constructs representations of moderate Islam through multimodal digital da’wah. Although research on online Islamic comm...

Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias

The relatively weak sea surface temperature bias in the tropical Indian Ocean (TIO) simulated in the coupledgeneral circulation model (CGCM) from the recently released CMIP6 has be...

Interactive Question Answering

The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...

AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model

Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...

The Practice of Multimodal Analgesia Technique for Patients Undergoing Surgery under General Anaesthesia in Debre Markos Compersive Specialized Hospital Debre Markos, East Gojjam, Ethiopia, 2022. A Cross-Sectional Study

Background: Practice guidelines for preoperative pain management recommend that multimodal analgesic therapy should be used for postsurgical patients. This method uses different a...

Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data

Abstarct Introduction Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This stu...

Email:
Password:

Email:

A robust visual question answering approach to reduce multimodal bias

Related Results