Javascript must be enabled to continue!

Use of SNOMED CT in Large Language Models: Scoping Review (Preprint)

BACKGROUND Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed. OBJECTIVE This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks. METHODS Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized. RESULTS The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics. CONCLUSIONS This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT’s relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.

JMIR Publications Inc.

Eunsuk Chang Sumi Sung

2024

Title: Use of SNOMED CT in Large Language Models: Scoping Review (Preprint)

Description:

Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks.

However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed.

OBJECTIVE This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks.

METHODS Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023.

Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks.

Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized.

RESULTS The review included 37 studies.

Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs.

Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%).

The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification.

While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons.

The reported gains varied widely across different metrics and tasks, ranging from 0.

87% to 131.

66%.

However, some studies showed either no improvement or a decline in certain performance metrics.

CONCLUSIONS This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation.

While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness.

Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT’s relational structure into LLMs.

In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

SNOMED CT Clinical Use Cases in the Context of Electronic Health Record Systems: A Systematic Literature Review (Preprint)

BACKGROUND The Systematized Medical Nomenclature for Medicine (SNOMED CT) is a clinical terminology system that provides a standardized and scientifically v...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

SNOMED CT in Pathology

Pathology information systems have been using SNOMED II for many years, and in most cases, they are in a migration process to SNOMED CT. COST Action IC0604 (EURO-TELEPATH) has cons...

Does SNOMED CT post-coordination scale?

SNOMED CT is a compositional terminology. Construction of post-coordinated expressions allows users to specify new meaning by referencing existing SNOMED CT concepts. The use of po...

Intégration de connaissances biomédicales hétérogènes grâce à un modèle basé sur les ontologies de support

Dans le domaine de la santé, il existe un nombre très important de sources de connaissances, qui vont de simples terminologies, classifications et vocabulaires contrôlés à des repr...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Well-being focused interventions for caregivers of children with developmental disabilities-a scoping review protocol

AbstractIntroductionChildren with developmental disabilities (DD) have complex health needs which imply that they will need assistance in many areas of their lives, a role usually ...

Email:
Password:

Email:

Use of SNOMED CT in Large Language Models: Scoping Review (Preprint)

Related Results