Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

View through CrossRef
Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54% , respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.
Title: ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning
Description:
Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task.
However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese.
This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia.
Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts.
Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system.
The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question.
The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user.
The ViReader system achieves new state-of-the-art performances, with values of 70.
83% EM (exact match) and 89.
54% F1, outperforming the BERT-based system by 11.
55% and 9.
54% , respectively.
It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts).
Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.
65% for English and 6.
13% for Chinese on the BiPaR dataset.
Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.

Related Results

Incidental Collocation Learning from Different Modes of Input and Factors That Affect Learning
Incidental Collocation Learning from Different Modes of Input and Factors That Affect Learning
Collocations, i.e., words that habitually co-occur in texts (e.g., strong coffee, heavy smoker), are ubiquitous in language and thus crucial for second/foreign language (L2) learne...
Exploiting Wikipedia Semantics for Computing Word Associations
Exploiting Wikipedia Semantics for Computing Word Associations
<p><b>Semantic association computation is the process of automatically quantifying the strength of a semantic connection between two textual units based on various lexi...
Wikipedia in Vascular Surgery Medical Education: Comparative Study (Preprint)
Wikipedia in Vascular Surgery Medical Education: Comparative Study (Preprint)
BACKGROUND Medical students commonly refer to Wikipedia as their preferred online resource for medical information. The quality and readability of articles ...
Wikipedia: a tool to monitor seasonal diseases trends?
Wikipedia: a tool to monitor seasonal diseases trends?
ObjectiveTo explore the interest of Wikipedia as a data source to monitorseasonal diseases trends in metropolitan France.IntroductionToday, Internet, especially Wikipedia, is an im...
The Effect of Teaching Strategies and Students’ Motivation on Students’ Narrative Writing Achievement
The Effect of Teaching Strategies and Students’ Motivation on Students’ Narrative Writing Achievement
The objectives of this study were to investigate whether: (1) Students’ achievement in reading comprehension taught by using SPE is higher than students’ Achievement in Reading com...
The Effect of Teaching Strategies and Students’ Motivation on Students’ Narrative Writing Achievement
The Effect of Teaching Strategies and Students’ Motivation on Students’ Narrative Writing Achievement
The objectives of this study were to investigate whether: (1) Students’ achievement in reading comprehension taught by using SPE is higher than students’ Achievement in Reading com...
COVID-19 research in Wikipedia
COVID-19 research in Wikipedia
Wikipedia is one of the main sources of free knowledge on the Web. During the first few months of the pandemic, over 5,200 new Wikipedia pages on COVID-19 were created, accumulatin...
The Effect of Tea Party Strategy toward Students' Reading Comprehension of Narrative Text
The Effect of Tea Party Strategy toward Students' Reading Comprehension of Narrative Text
n teaching reading, teaching technique and reading interest influence student’s reading comprehension. Tea Party strategy requires students to access background knowledge or review...

Back to Top