Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A robust visual question answering approach to reduce multimodal bias

View through CrossRef
Currently, many visual question answering models have bias problems. Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability. For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images. In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored. Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results. Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels. Experiments on datasets such as VQA-CP v2.0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.
Title: A robust visual question answering approach to reduce multimodal bias
Description:
Currently, many visual question answering models have bias problems.
Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability.
For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images.
In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored.
Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results.
Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels.
Experiments on datasets such as VQA-CP v2.
0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.

Related Results

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
MULTIMODAL STRATEGIES OF MODERATE ISLAMIC DIGITAL DA’WAH ON THE RUKUN INDONESIA YOUTUBE CHANNEL
MULTIMODAL STRATEGIES OF MODERATE ISLAMIC DIGITAL DA’WAH ON THE RUKUN INDONESIA YOUTUBE CHANNEL
This study examines how the Rukun Indonesia YouTube channel constructs representations of moderate Islam through multimodal digital da’wah. Although research on online Islamic comm...
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
The relatively weak sea surface temperature bias in the tropical Indian Ocean (TIO) simulated in the coupledgeneral circulation model (CGCM) from the recently released CMIP6 has be...
Interactive Question Answering
Interactive Question Answering
The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Abstarct Introduction Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This stu...

Back to Top