Javascript must be enabled to continue!
On the reliability of factoid question answering evaluation
View through CrossRef
This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method. Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrect1) are not as stable and sensitive as reciprocal rank. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a
mild
gain value assignment is used. Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them. As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.
Association for Computing Machinery (ACM)
Title: On the reliability of factoid question answering evaluation
Description:
This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method.
Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrect1) are not as stable and sensitive as reciprocal rank.
(2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a
mild
gain value assignment is used.
Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them.
As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.
Related Results
Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
Factoid question answering for spoken documents
Factoid question answering for spoken documents
In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents.
This work explores, for the first time...
A Conversational Chatbot Based on Kowledge-Graphs for Factoid Medical Questions
A Conversational Chatbot Based on Kowledge-Graphs for Factoid Medical Questions
In the last years, the interest about enhancing the interface usability of applications has strongly increased, focusing, in particular, on chatbots, i.e. conversational agent that...
Interactive Question Answering
Interactive Question Answering
The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract
The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...
Domination of polynomial with application
Domination of polynomial with application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology
Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology
Abstract
Nanjing Yunjin, a traditional Chinese silk weaving craft, is celebrated globally for its unique local characteristics and exquisite workmanship, forming an integ...
EVJVQA CHALLENGE: MULTILINGUAL VISUAL QUESTION ANSWERING
EVJVQA CHALLENGE: MULTILINGUAL VISUAL QUESTION ANSWERING
Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a...

