Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

On the reliability of factoid question answering evaluation

View through CrossRef
This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method. Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrect1) are not as stable and sensitive as reciprocal rank. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a mild gain value assignment is used. Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them. As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.
Title: On the reliability of factoid question answering evaluation
Description:
This paper compares some existing evaluation metrics for factoid question answering (QA) from the viewpoint of stability and sensitivity, using the NTCIR-4 QAC2 Japanese factoid QA tasks and the Buckley/Voorhees stability method and Voorhees/Buckley swap method.
Our main findings are: (1) For QA evaluation with ranked lists containing up to five answers, the fraction of questions with a correct answer within top 5 (NQcorrect5) and that with a correct answer at rank 1 (NQcorrect1) are not as stable and sensitive as reciprocal rank.
(2) Q-measure, which can handle multiple correct answers and answer correctness levels, is at least as stable and sensitive as reciprocal rank, provided that a mild gain value assignment is used.
Emphasizing answer correctness levels tends to hurt stability and sensitivity, while handling multiple correct answers improves them.
As our experimental methods are language-independent, we believe that these findings apply to QA in languages other than Japanese as well.

Related Results

Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
Factoid question answering for spoken documents
Factoid question answering for spoken documents
In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents. This work explores, for the first time...
Interactive Question Answering
Interactive Question Answering
The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...
Domination of polynomial with application
Domination of polynomial with application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
A Conversational Chatbot Based on Kowledge-Graphs for Factoid Medical Questions
A Conversational Chatbot Based on Kowledge-Graphs for Factoid Medical Questions
In the last years, the interest about enhancing the interface usability of applications has strongly increased, focusing, in particular, on chatbots, i.e. conversational agent that...
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
BACKGROUND Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various n...
Barrier Function Method in Reliability Based Design Optimization
Barrier Function Method in Reliability Based Design Optimization
In practical design applications, most design variables such as thickness, diameter and material properties are not deterministic but stochastic numbers that can be represented by ...
Application of Reliability Engineering to Offshore Production Equipment
Application of Reliability Engineering to Offshore Production Equipment
ABSTRACT Standard Oil Co. of California performed a reliability study on their subsea completion system in 1973 using an "event-consequence" approach. The study w...

Back to Top