Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A robust visual question answering approach to reduce multimodal bias

View through CrossRef
Currently, many visual question answering models have bias problems. Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability. For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images. In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored. Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results. Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels. Experiments on datasets such as VQA-CP v2.0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.
Title: A robust visual question answering approach to reduce multimodal bias
Description:
Currently, many visual question answering models have bias problems.
Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability.
For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images.
In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored.
Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results.
Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels.
Experiments on datasets such as VQA-CP v2.
0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.

Related Results

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
The relatively weak sea surface temperature bias in the tropical Indian Ocean (TIO) simulated in the coupledgeneral circulation model (CGCM) from the recently released CMIP6 has be...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Interactive Question Answering
Interactive Question Answering
The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Abstarct Introduction Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This stu...
“THE LIGHT OF THE NIGHT” IN THE FOOTLIGHTS: CERVANTES’ MOTIFS IN THE CONCEPT OF A MULTIMODAL DRAMA BY ANTONIO BUENO GARCIA
“THE LIGHT OF THE NIGHT” IN THE FOOTLIGHTS: CERVANTES’ MOTIFS IN THE CONCEPT OF A MULTIMODAL DRAMA BY ANTONIO BUENO GARCIA
The review examines the nature, characteristics, and new configurations of the dramatic multimodal work of A. Bueno García “Cervantes in Algiers: Captive in Algiers, The Light of t...

Back to Top