Javascript must be enabled to continue!
Diagnostic Performance of Claude 3 from Patient History and Key Images in Diagnosis Please Cases
View through CrossRef
Abstract
Backgrounds
Large language artificial intelligence models have showed its diagnostic performance based solely on textual information from clinical history and imaging findings. However, the extent of their performance when utilizing radiological images and providing differential diagnoses has yet to be investigated.
Purpose
We employed the latest version of Claude 3, Opus, released on March 4, 2024, to investigate its diagnostic performance in answering Radiology’s Diagnosis Please quiz questions under three conditions: (1) when provided with clinical history alone; (2) when given clinical history along with imaging findings; and (3) when supplied with clinical history and key images.
Furthermore, we evaluated the diagnostic performance of the model when instructed to list differential diagnoses.
Materials and Methods
Claude 3 Opus was tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology’s “Diagnosis Please” cases, which included cases 1 to 322, published from 1998 to 2023. The analyses were carried out under the following input conditions:
Condition 1: Submitter-provided clinical history (text) alone
Condition 2: Submitter-provided clinical history and imaging findings (text) Condition 3: Submitter-provided clinical history (text) and key images (PDF files)
We applied McNemar’s tests to evaluate differences in correct response rates for primary diagnoses across Conditions 1, 2, and 3.
Results
The correct primary diagnoses rates were 62/322 (19.3%), 178/322 (55.3%), and 93/322 (28.8%) for Conditions 1, 2, and 3, respectively. Additionally, Claude 3 Opus accurately provided the correct answer as a differential diagnosis in up to 22/322 (6.8%) of cases. There were statistically significant differences in correct response rates for primary diagnoses between all combinations of Conditions 1, 2, and 3 (p<0.001).
Conclusion
Claude 3 Opus demonstrated significantly improved diagnostic performance by inputting key images in addition to clinical history. The ability to list important differential diagnoses was also confirmed.
Key Results
This study investigated Claude 3 Opus’s performance in Radiology Diagnosis Please Cases using clinical history, key images, and imaging findings.
Key images or imaging findings inputs significantly improved correct primary diagnoses from 19.3% to 28.8% or 55.5%, respectively.
By having two additional differential diagnoses presented, total correct responses improved by 3.1–6.8%.
Summary statement
Large language AI model Claude 3 Opus demonstrated significantly improved diagnostic accuracy by adding key images with clinical history compared with clinical history alone.
Title: Diagnostic Performance of Claude 3 from Patient History and Key Images in Diagnosis Please Cases
Description:
Abstract
Backgrounds
Large language artificial intelligence models have showed its diagnostic performance based solely on textual information from clinical history and imaging findings.
However, the extent of their performance when utilizing radiological images and providing differential diagnoses has yet to be investigated.
Purpose
We employed the latest version of Claude 3, Opus, released on March 4, 2024, to investigate its diagnostic performance in answering Radiology’s Diagnosis Please quiz questions under three conditions: (1) when provided with clinical history alone; (2) when given clinical history along with imaging findings; and (3) when supplied with clinical history and key images.
Furthermore, we evaluated the diagnostic performance of the model when instructed to list differential diagnoses.
Materials and Methods
Claude 3 Opus was tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology’s “Diagnosis Please” cases, which included cases 1 to 322, published from 1998 to 2023.
The analyses were carried out under the following input conditions:
Condition 1: Submitter-provided clinical history (text) alone
Condition 2: Submitter-provided clinical history and imaging findings (text) Condition 3: Submitter-provided clinical history (text) and key images (PDF files)
We applied McNemar’s tests to evaluate differences in correct response rates for primary diagnoses across Conditions 1, 2, and 3.
Results
The correct primary diagnoses rates were 62/322 (19.
3%), 178/322 (55.
3%), and 93/322 (28.
8%) for Conditions 1, 2, and 3, respectively.
Additionally, Claude 3 Opus accurately provided the correct answer as a differential diagnosis in up to 22/322 (6.
8%) of cases.
There were statistically significant differences in correct response rates for primary diagnoses between all combinations of Conditions 1, 2, and 3 (p<0.
001).
Conclusion
Claude 3 Opus demonstrated significantly improved diagnostic performance by inputting key images in addition to clinical history.
The ability to list important differential diagnoses was also confirmed.
Key Results
This study investigated Claude 3 Opus’s performance in Radiology Diagnosis Please Cases using clinical history, key images, and imaging findings.
Key images or imaging findings inputs significantly improved correct primary diagnoses from 19.
3% to 28.
8% or 55.
5%, respectively.
By having two additional differential diagnoses presented, total correct responses improved by 3.
1–6.
8%.
Summary statement
Large language AI model Claude 3 Opus demonstrated significantly improved diagnostic accuracy by adding key images with clinical history compared with clinical history alone.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Autonomy on Trial
Autonomy on Trial
Photo by CHUTTERSNAP on Unsplash
Abstract
This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...
Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS
Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS
Abstract
Background
Diagnosis of neurogenic thoracic outlet syndrome (nTOS) is hindered by symptom overlap with cervical radiculopathy, carpal tunnel syndrome, or psychosomatic dis...
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Provocative Tests in Diagnosis of Thoracic Outlet Syndrome: A Narrative Review
Provocative Tests in Diagnosis of Thoracic Outlet Syndrome: A Narrative Review
Abstract
Thoracic outlet syndrome (TOS) is a group of conditions caused by the compression of the neurovascular bundle within the thoracic outlet. It is classified into three main ...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Emerging Evidence of IgG4-Related Disease in Pericarditis: A Systematic Review
Emerging Evidence of IgG4-Related Disease in Pericarditis: A Systematic Review
Abstract
Introduction
Immunoglobulin G4-related disease (IgG4-RD) is a recently identified immune-mediated condition that is debilitating and often overlooked. While IgG4-RD has be...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract
Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...

