Javascript must be enabled to continue!

ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

BACKGROUND Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge. It’s challenging to match a new question to previous similar questions for answering. In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database. BERT (i.e. Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching. As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT. Both of BERT and ALBERT can be used to address the QM problem. OBJECTIVE In this study, we aim to develop an ALBERT based method for Chinese health related question matching. METHODS Our proposed method, named as ALBERT-QM, consists of three components. (1)Data augmenting. Similar health question pairs were augmented for training preparation. (2)ALBERT model training. Given the augmented training pairs, three ALBERT models were trained and fine-tuned. (3)Similarity combining. Health question similarity score were calculated by combining ALBRT model outputs with text similarity. To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs. RESULTS Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.69%, recall of 86.70% and F1 of 86.69%. Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.73%. Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.8MB which is 1/6 times that other BERT models. We made our ALBERT-QM open accessible at https://github.com/trueto/albert_question_match. CONCLUSIONS In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system. Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.

JMIR Publications Inc.

Feihong Yang Jiao Li

2020

Title: ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

Description:

BACKGROUND Question answering (QA) system is widely used in web-based health-care applications.

Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge.

It’s challenging to match a new question to previous similar questions for answering.

In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database.

BERT (i.

Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching.

As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT.

Both of BERT and ALBERT can be used to address the QM problem.

OBJECTIVE In this study, we aim to develop an ALBERT based method for Chinese health related question matching.

METHODS Our proposed method, named as ALBERT-QM, consists of three components.

(1)Data augmenting.

Similar health question pairs were augmented for training preparation.

(2)ALBERT model training.

Given the augmented training pairs, three ALBERT models were trained and fine-tuned.

(3)Similarity combining.

Health question similarity score were calculated by combining ALBRT model outputs with text similarity.

To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs.

RESULTS Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.

69%, recall of 86.

70% and F1 of 86.

69%.

Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.

73%.

Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.

8MB which is 1/6 times that other BERT models.

We made our ALBERT-QM open accessible at https://github.

com/trueto/albert_question_match.

CONCLUSIONS In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system.

Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.

Back

The 2021 England and Wales Census was matched to the Census Coverage Survey (CCS). This was an essential requisite for estimating undercount in the Census. To ensure outputs could ...

ACKNOWLEDGMENTS

The UP Manila Health Policy Development Hub recognizes the invaluable contribution of the participants in theseries of roundtable discussions listed below: RTD: Beyond Hospit...

Detecting Matching Blunders of Multi-Source Remote Sensing Images via Graph Theory

Large radiometric and geometric distortion in multi-source images leads to fewer matching points with high matching blunder ratios, and global geometric relationship models between...

[RETRACTED] Bridport Health Reviews - Powerfully Detoxifies The Liver, Lose Liver Fat And Improve Gut Health! v1

[RETRACTED]Product Name - Bridport Health Ingredients - Milk Thistle, Beetroot, Artichoke Extract & More. Category - Liver Support Supplement Main Benefits - Helps Protect The ...

Deep-Image-Matching: an open-source toolbox for multi-view image matching of complex geomorphological scenarios

Geomorphometry and geomorphological mapping are essential tools for understanding landscape changes. The recent availability of 3D imaging sensors and processing techniques, includ...

CIE S 014-1:2006 Colorimetry - Part 1: CIE Standard Colorimetric Observers

Superseded by Colorimetry - Part 1: CIE Standard Colorimetric Observers, 2nd Edition-\n--\n-Joint ISO/CIE Standard-\n--\n-ISO 11664-1:2007(E)/CIE S 014-1/E:2006-\n--\n-This CIE Sta...

Chinese cultural symbols in Thailand : a study of Mazu belief in the Chinese community in Bangkok Chinatown

The overseas Chinese living around the world constitute a unique and far-reaching social group in the development of modern human history. While seeking survival and development in...

Housing Improvements for Health and Associated Socio‐Economic Outcomes: A Systematic Review

Poor housing is associated with poor health. This suggests that improving housing conditions might lead to improved health for residents. This review searched widely for studies fr...

Email:
Password:

Email:

ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

Related Results