Javascript must be enabled to continue!
Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets – A feasibility study
View through CrossRef
Abstract
Background
Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models’ potential to detect opportunities for semantic harmonization.
Methods
Using models like SBERT and OpenAI’s ADA, we developed a prototype application (“Semantic Search Helper”) to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach’s feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.
Results
With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.
Conclusions
This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.
Royal College of Psychiatrists
Title: Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets – A feasibility study
Description:
Abstract
Background
Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis.
A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations.
The present article therefore explores embedding models’ potential to detect opportunities for semantic harmonization.
Methods
Using models like SBERT and OpenAI’s ADA, we developed a prototype application (“Semantic Search Helper”) to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets.
The approach’s feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.
Results
With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts.
Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.
Conclusions
This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.
Related Results
[RETRACTED] ChilWell Portable AC “Portable AC Cooler” Reviews v1
[RETRACTED] ChilWell Portable AC “Portable AC Cooler” Reviews v1
[RETRACTED]Is it safe to say that you are searching for inexpensively compact air cooling arrangement? Indeed, the late spring season is at its pinnacle and there is tremendous int...
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...
Menilai Tahap Pengaruh Peranti Digital Terhadap Perkembangan Kanak-kanak dalam Meningkatkan Fasih Digital bagi Kanak-kanak Berusia 5-6 Tahun
Menilai Tahap Pengaruh Peranti Digital Terhadap Perkembangan Kanak-kanak dalam Meningkatkan Fasih Digital bagi Kanak-kanak Berusia 5-6 Tahun
Kajian ini bertujuan untuk menilai tahap pengaruh penggunaan peranti digital terhadap perkembangan kemahiran digital kanak-kanak berusia 5 hingga 6 tahun. Seiring dengan perkembang...
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
Effect of data harmonization of multicentric dataset in ASD/TD classification
Effect of data harmonization of multicentric dataset in ASD/TD classification
Abstract
Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlat...

