Javascript must be enabled to continue!
Detecting Redundant Health Survey Questions Using Language-agnostic BERT Sentence Embedding (LaBSE) (Preprint)
View through CrossRef
BACKGROUND
As the importance of PGHD in healthcare and research has increased, efforts to standardize survey-based PGHD to improve its usability and interoperability have been made. Standardization efforts, such as the Patient-Reported Outcomes Measurement Information System (PROMIS) and the NIH Common Data Elements (CDE) repository, provided effective tools for managing and unifying health survey questions. However, Previous methods using ontology-mediated annotation are not only labor-intensive and difficult to scale, but also face challenges in identifying semantic redundancies in survey questions, especially across multiple languages.
OBJECTIVE
The goal of this work was to compute the semantic similarity among publicly available health survey questions in order to facilitate the standardization of survey-based PGHD.
METHODS
We compiled various health survey questions authored in both English and Korean from the NIH CDE Repository, PROMIS, Korean public health agencies, and academic publications. Questions were drawn from various health lifelog domains. A randomized question pairing scheme was used to generate a Semantic Text Similarity (STS) dataset consisting of 1758 question pairs. Similarity scores between each question pair were assigned by two human experts. The tagged dataset was then used to build four classifiers featuring: Bag-of-Words, SBERT with BERT-based embeddings, SBRET with LaBSE embeddings, and GPT-4o. The algorithms were evaluated using traditional contingency statistics.
RESULTS
Among the three algorithms, SBERT-LaBSE demonstrated the highest performance in assessing question similarity across both languages, achieving an Area Under the Receiver Operating Characteristic (ROC) and Precision-Recall Curves of over 0.99. Additionally, it proved effective in identifying cross-lingual semantic similarities.
CONCLUSIONS
This study introduces the SBERT-LaBSE algorithm for calculating semantic similarity across two languages, showing it outperforms BERT-based models, GPT-4o model and Bag of Words approach, highlighting its potential to improve semantic interoperability of survey-based PGHD across language barriers.
Title: Detecting Redundant Health Survey Questions Using Language-agnostic BERT Sentence Embedding (LaBSE) (Preprint)
Description:
BACKGROUND
As the importance of PGHD in healthcare and research has increased, efforts to standardize survey-based PGHD to improve its usability and interoperability have been made.
Standardization efforts, such as the Patient-Reported Outcomes Measurement Information System (PROMIS) and the NIH Common Data Elements (CDE) repository, provided effective tools for managing and unifying health survey questions.
However, Previous methods using ontology-mediated annotation are not only labor-intensive and difficult to scale, but also face challenges in identifying semantic redundancies in survey questions, especially across multiple languages.
OBJECTIVE
The goal of this work was to compute the semantic similarity among publicly available health survey questions in order to facilitate the standardization of survey-based PGHD.
METHODS
We compiled various health survey questions authored in both English and Korean from the NIH CDE Repository, PROMIS, Korean public health agencies, and academic publications.
Questions were drawn from various health lifelog domains.
A randomized question pairing scheme was used to generate a Semantic Text Similarity (STS) dataset consisting of 1758 question pairs.
Similarity scores between each question pair were assigned by two human experts.
The tagged dataset was then used to build four classifiers featuring: Bag-of-Words, SBERT with BERT-based embeddings, SBRET with LaBSE embeddings, and GPT-4o.
The algorithms were evaluated using traditional contingency statistics.
RESULTS
Among the three algorithms, SBERT-LaBSE demonstrated the highest performance in assessing question similarity across both languages, achieving an Area Under the Receiver Operating Characteristic (ROC) and Precision-Recall Curves of over 0.
99.
Additionally, it proved effective in identifying cross-lingual semantic similarities.
CONCLUSIONS
This study introduces the SBERT-LaBSE algorithm for calculating semantic similarity across two languages, showing it outperforms BERT-based models, GPT-4o model and Bag of Words approach, highlighting its potential to improve semantic interoperability of survey-based PGHD across language barriers.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
BACKGROUND
Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
Abstract
Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...
Study on Electromagnetic Shielding of Infrared /Visible Optical Window
Study on Electromagnetic Shielding of Infrared /Visible Optical Window
In allusion to electromagnetic radiation damage that existed in daily life, social safety and military field, electromagnetic shielding technology of infrared and infrared optical ...
Thematic Roles of Sentence Elements Found in "Me Before You" Movie
Thematic Roles of Sentence Elements Found in "Me Before You" Movie
Sentence is very important in learning language. Sentence is used in every language activity. For understanding sentence, we must study structure of the sentence, elements that for...
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
BACKGROUND
Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various n...
ACKNOWLEDGMENTS
ACKNOWLEDGMENTS
The UP Manila Health Policy Development Hub recognizes the invaluable contribution of the participants in theseries of roundtable discussions listed below:
RTD: Beyond Hospit...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...

