Javascript must be enabled to continue!
Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology
View through CrossRef
ImportanceAlthough augmenting large language models (LLMs) with knowledge bases may improve medical domain–specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals.ObjectiveTo develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings.Design, Setting, and ParticipantsChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients.ExposuresLLM response to clinical questions.Main Outcomes and MeasuresAccuracy, utility, and safety of LLMs in responding to clinical questions.ResultsThe baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, −0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, −0.3% to 10.0%; P = .06).Conclusions and RelevanceResults of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM’s performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.
American Medical Association (AMA)
Title: Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology
Description:
ImportanceAlthough augmenting large language models (LLMs) with knowledge bases may improve medical domain–specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals.
ObjectiveTo develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings.
Design, Setting, and ParticipantsChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge.
This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.
5 Turbo (OpenAI), across 300 clinical questions in ophthalmology.
The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety.
A double-masked approach was used to try to minimize bias assessment across all models.
The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients.
ExposuresLLM response to clinical questions.
Main Outcomes and MeasuresAccuracy, utility, and safety of LLMs in responding to clinical questions.
ResultsThe baseline model achieved a human ranking score of 0.
48.
The retrieval-augmented LLM had a score of 0.
60, a difference of 0.
12 (95% CI, 0.
02-0.
22; P = .
02) from baseline and not different from GPT-4 with a score of 0.
61 (difference = 0.
01; 95% CI, −0.
11 to 0.
13; P = .
89).
For scientific consensus, the retrieval-augmented LLM was 84.
0% compared with the baseline model of 46.
5% (difference = 37.
5%; 95% CI, 29.
0%-46.
0%; P < .
001) and not different from GPT-4 with a value of 79.
2% (difference = 4.
8%; 95% CI, −0.
3% to 10.
0%; P = .
06).
Conclusions and RelevanceResults of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM’s performance in medical domains.
This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information.
Further research is needed to explore the broader application of such frameworks in the real world.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Augmented Reality for Smoking Cessation: Development and Usability Study (Preprint)
Augmented Reality for Smoking Cessation: Development and Usability Study (Preprint)
BACKGROUND
The recent widespread availability of augmented reality via smartphone offers an opportunity to translate cue exposure therapy for smoking cessat...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Ophthalmology: a dwindling specialty in Pakistan
Ophthalmology: a dwindling specialty in Pakistan
Madam Editor,
Ophthalmology is a branch of medical science that deals with diagnosing and treating eye disorders. It comprises both the medicinal and surgical aspects of eye care. ...
Improving Sentence Retrieval Using Sequence Similarity
Improving Sentence Retrieval Using Sequence Similarity
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or nove...
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
The Cloud Technologies and Augmented Reality: the Prospects of Use
The Cloud Technologies and Augmented Reality: the Prospects of Use
The article discusses the prospects of the augmented reality using as a component of a cloud-based environment. The research goals are the next: to explore the possibility of the a...
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
On the basis of the comparative analysis of texts of the German biweekly magazine Brigitte and the weekly magazine Der Spiegel and under the presumption that gender-sensitive langu...

