Javascript must be enabled to continue!
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
View through CrossRef
BACKGROUND
Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge. It’s challenging to match a new question to previous similar questions for answering. In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database. BERT (i.e. Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching. As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT. Both of BERT and ALBERT can be used to address the QM problem.
OBJECTIVE
In this study, we aim to develop an ALBERT based method for Chinese health related question matching.
METHODS
Our proposed method, named as ALBERT-QM, consists of three components. (1)Data augmenting. Similar health question pairs were augmented for training preparation. (2)ALBERT model training. Given the augmented training pairs, three ALBERT models were trained and fine-tuned. (3)Similarity combining. Health question similarity score were calculated by combining ALBRT model outputs with text similarity.
To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs.
RESULTS
Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.69%, recall of 86.70% and F1 of 86.69%. Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.73%. Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.8MB which is 1/6 times that other BERT models. We made our ALBERT-QM open accessible at https://github.com/trueto/albert_question_match.
CONCLUSIONS
In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system. Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.
Title: ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
Description:
BACKGROUND
Question answering (QA) system is widely used in web-based health-care applications.
Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge.
It’s challenging to match a new question to previous similar questions for answering.
In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database.
BERT (i.
e.
Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching.
As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT.
Both of BERT and ALBERT can be used to address the QM problem.
OBJECTIVE
In this study, we aim to develop an ALBERT based method for Chinese health related question matching.
METHODS
Our proposed method, named as ALBERT-QM, consists of three components.
(1)Data augmenting.
Similar health question pairs were augmented for training preparation.
(2)ALBERT model training.
Given the augmented training pairs, three ALBERT models were trained and fine-tuned.
(3)Similarity combining.
Health question similarity score were calculated by combining ALBRT model outputs with text similarity.
To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs.
RESULTS
Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.
69%, recall of 86.
70% and F1 of 86.
69%.
Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.
73%.
Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.
8MB which is 1/6 times that other BERT models.
We made our ALBERT-QM open accessible at https://github.
com/trueto/albert_question_match.
CONCLUSIONS
In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system.
Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.
Related Results
ACKNOWLEDGMENTS
ACKNOWLEDGMENTS
The UP Manila Health Policy Development Hub recognizes the invaluable contribution of the participants in theseries of roundtable discussions listed below:
RTD: Beyond Hospit...
2021 Census to Census Coverage Survey Matching Results.
2021 Census to Census Coverage Survey Matching Results.
The 2021 England and Wales Census was matched to the Census Coverage Survey (CCS). This was an essential requisite for estimating undercount in the Census. To ensure outputs could ...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
CIE S 014-1:2006 Colorimetry - Part 1: CIE Standard Colorimetric Observers
CIE S 014-1:2006 Colorimetry - Part 1: CIE Standard Colorimetric Observers
Superseded by Colorimetry - Part 1: CIE Standard Colorimetric Observers, 2nd Edition-\n--\n-Joint ISO/CIE Standard-\n--\n-ISO 11664-1:2007(E)/CIE S 014-1/E:2006-\n--\n-This CIE Sta...
Research on Multi-objective Shared Parking Matching Decision Based on Two-side Matching
Research on Multi-objective Shared Parking Matching Decision Based on Two-side Matching
Abstract
The purpose of this paper is to find a reasonable method for matching shared parking spaces. The travelers and the parking space sharing party are regarded ...
The usage and acceptance of domestic preprint servers in China
The usage and acceptance of domestic preprint servers in China
PurposeThe aims of this article are to describe the current status, usage, and acceptance of domestic preprint servers in mainland China by investigating three integrated preprint ...
A Fast Pattern Matching Algorithm Based on Middle Characters of Pattern String
A Fast Pattern Matching Algorithm Based on Middle Characters of Pattern String
String pattern matching is one of the important string operation. At present, the pattern matching algorithm of strings mainly includes BF algorithm, KMP algorithm, and improved KM...
Impedance Matching Network
Impedance Matching Network
<p> In this article, four different matching network will be introduce and their value in network will be given by hand calculating and computer analysis. In the first match...

