Javascript must be enabled to continue!

Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)

BACKGROUND Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic subspecialties have yet to be fully explored. OBJECTIVE This study aims to investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and diagnosing eye diseases within clinical case profiles. METHODS This cross-sectional study used clinical cases from Chinese Standardized Resident Training–Ophthalmology (2nd Edition). For each case, 2 profiles were created: patient with history (Hx) and patient with history and examination (Hx+Ex). These profiles served as independent queries for GPT-3.5 and GPT-4.0 (accessed from March 5 to 18, 2024). Similarly, 3 ophthalmic residents were posed the same profiles in a questionnaire format. The accuracy of recommending ophthalmic subspecialty registration was primarily evaluated using Hx profiles. The accuracy of the top-ranked diagnosis and the accuracy of the diagnosis within the top 3 suggestions (do-not-miss diagnosis) were assessed using Hx+Ex profiles. The gold standard for judgment was the published, official diagnosis. Characteristics of incorrect diagnoses by ChatGPT were also analyzed. RESULTS A total of 208 clinical profiles from 12 ophthalmic subspecialties were analyzed (104 Hx and 104 Hx+Ex profiles). For Hx profiles, GPT-3.5, GPT-4.0, and residents showed comparable accuracy in registration suggestions (66/104, 63.5%; 81/104, 77.9%; and 72/104, 69.2%, respectively; P=.07), with ocular trauma, retinal diseases, and strabismus and amblyopia achieving the top 3 accuracies. For Hx+Ex profiles, both GPT-4.0 and residents demonstrated higher diagnostic accuracy than GPT-3.5 (62/104, 59.6% and 63/104, 60.6% vs 41/104, 39.4%; P=.003 and P=.001, respectively). Accuracy for do-not-miss diagnoses also improved (79/104, 76% and 68/104, 65.4% vs 51/104, 49%; P<.001 and P=.02, respectively). The highest diagnostic accuracies were observed in glaucoma; lens diseases; and eyelid, lacrimal, and orbital diseases. GPT-4.0 recorded fewer incorrect top-3 diagnoses (25/42, 60% vs 53/63, 84%; P=.005) and more partially correct diagnoses (21/42, 50% vs 7/63 11%; P<.001) than GPT-3.5, while GPT-3.5 had more completely incorrect (27/63, 43% vs 7/42, 17%; P=.005) and less precise diagnoses (22/63, 35% vs 5/42, 12%; P=.009). CONCLUSIONS GPT-3.5 and GPT-4.0 showed intermediate performance in recommending ophthalmic subspecialties for registration. While GPT-3.5 underperformed, GPT-4.0 approached and numerically surpassed residents in differential diagnosis. AI chatbots show promise in facilitating ophthalmic patient registration. However, their integration into diagnostic decision-making requires more validation.

JMIR Publications Inc.

Shuai Ming Xi Yao Xiaohong Guo Qingge Guo Kunpeng Xie Dandan Chen Bo Lei

2024

Title: Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)

Description:

BACKGROUND Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly.

Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic subspecialties have yet to be fully explored.

OBJECTIVE This study aims to investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and diagnosing eye diseases within clinical case profiles.

METHODS This cross-sectional study used clinical cases from Chinese Standardized Resident Training–Ophthalmology (2nd Edition).

For each case, 2 profiles were created: patient with history (Hx) and patient with history and examination (Hx+Ex).

These profiles served as independent queries for GPT-3.

5 and GPT-4.

0 (accessed from March 5 to 18, 2024).

Similarly, 3 ophthalmic residents were posed the same profiles in a questionnaire format.

The accuracy of recommending ophthalmic subspecialty registration was primarily evaluated using Hx profiles.

The accuracy of the top-ranked diagnosis and the accuracy of the diagnosis within the top 3 suggestions (do-not-miss diagnosis) were assessed using Hx+Ex profiles.

The gold standard for judgment was the published, official diagnosis.

Characteristics of incorrect diagnoses by ChatGPT were also analyzed.

RESULTS A total of 208 clinical profiles from 12 ophthalmic subspecialties were analyzed (104 Hx and 104 Hx+Ex profiles).

For Hx profiles, GPT-3.

5, GPT-4.

0, and residents showed comparable accuracy in registration suggestions (66/104, 63.

5%; 81/104, 77.

9%; and 72/104, 69.

2%, respectively; P=.

07), with ocular trauma, retinal diseases, and strabismus and amblyopia achieving the top 3 accuracies.

For Hx+Ex profiles, both GPT-4.

0 and residents demonstrated higher diagnostic accuracy than GPT-3.

5 (62/104, 59.

6% and 63/104, 60.

6% vs 41/104, 39.

4%; P=.

003 and P=.

001, respectively).

Accuracy for do-not-miss diagnoses also improved (79/104, 76% and 68/104, 65.

4% vs 51/104, 49%; P<.

001 and P=.

02, respectively).

The highest diagnostic accuracies were observed in glaucoma; lens diseases; and eyelid, lacrimal, and orbital diseases.

GPT-4.

0 recorded fewer incorrect top-3 diagnoses (25/42, 60% vs 53/63, 84%; P=.

005) and more partially correct diagnoses (21/42, 50% vs 7/63 11%; P<.

001) than GPT-3.

5, while GPT-3.

5 had more completely incorrect (27/63, 43% vs 7/42, 17%; P=.

005) and less precise diagnoses (22/63, 35% vs 5/42, 12%; P=.

009).

CONCLUSIONS GPT-3.

5 and GPT-4.

0 showed intermediate performance in recommending ophthalmic subspecialties for registration.

While GPT-3.

5 underperformed, GPT-4.

0 approached and numerically surpassed residents in differential diagnosis.

AI chatbots show promise in facilitating ophthalmic patient registration.

However, their integration into diagnostic decision-making requires more validation.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research

Dear Editors, Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...

Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model

Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...

User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)

BACKGROUND With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...

Appearance of ChatGPT and English Study

The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions (Preprint)

BACKGROUND Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information ...

Email:
Password:

Email:

Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)

Related Results