Javascript must be enabled to continue!
A robust visual question answering approach to reduce multimodal bias
View through CrossRef
Currently, many visual question answering models have bias problems. Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability. For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images. In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored. Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results. Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels. Experiments on datasets such as VQA-CP v2.0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.
Title: A robust visual question answering approach to reduce multimodal bias
Description:
Currently, many visual question answering models have bias problems.
Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability.
For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images.
In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored.
Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results.
Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels.
Experiments on datasets such as VQA-CP v2.
0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.
Related Results
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
Tropical Indian Ocean Mixed Layer Bias in CMIP6 CGCMs Primarily Attributed tothe AGCM Surface Wind Bias
The relatively weak sea surface temperature bias in the tropical Indian Ocean (TIO) simulated in the coupledgeneral circulation model (CGCM) from the recently released CMIP6 has be...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Interactive Question Answering
Interactive Question Answering
The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...
The Practice of Multimodal Analgesia Technique for Patients Undergoing Surgery under General Anaesthesia in Debre Markos Compersive Specialized Hospital Debre Markos, East Gojjam, Ethiopia, 2022. A Cross-Sectional Study
The Practice of Multimodal Analgesia Technique for Patients Undergoing Surgery under General Anaesthesia in Debre Markos Compersive Specialized Hospital Debre Markos, East Gojjam, Ethiopia, 2022. A Cross-Sectional Study
Background: Practice guidelines for preoperative pain management recommend that multimodal analgesic therapy should be used for postsurgical patients. This method uses different a...
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Abstarct
Introduction
Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This stu...
“THE LIGHT OF THE NIGHT” IN THE FOOTLIGHTS: CERVANTES’ MOTIFS IN THE CONCEPT OF A MULTIMODAL DRAMA BY ANTONIO BUENO GARCIA
“THE LIGHT OF THE NIGHT” IN THE FOOTLIGHTS: CERVANTES’ MOTIFS IN THE CONCEPT OF A MULTIMODAL DRAMA BY ANTONIO BUENO GARCIA
The review examines the nature, characteristics, and new configurations of the dramatic multimodal
work of A. Bueno García “Cervantes in Algiers: Captive in Algiers, The Light of t...

