Javascript must be enabled to continue!

Diagnostic Performance of Claude 3 from Patient History and Key Images in Diagnosis Please Cases

Abstract Backgrounds Large language artificial intelligence models have showed its diagnostic performance based solely on textual information from clinical history and imaging findings. However, the extent of their performance when utilizing radiological images and providing differential diagnoses has yet to be investigated. Purpose We employed the latest version of Claude 3, Opus, released on March 4, 2024, to investigate its diagnostic performance in answering Radiology’s Diagnosis Please quiz questions under three conditions: (1) when provided with clinical history alone; (2) when given clinical history along with imaging findings; and (3) when supplied with clinical history and key images. Furthermore, we evaluated the diagnostic performance of the model when instructed to list differential diagnoses. Materials and Methods Claude 3 Opus was tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology’s “Diagnosis Please” cases, which included cases 1 to 322, published from 1998 to 2023. The analyses were carried out under the following input conditions: Condition 1: Submitter-provided clinical history (text) alone Condition 2: Submitter-provided clinical history and imaging findings (text) Condition 3: Submitter-provided clinical history (text) and key images (PDF files) We applied McNemar’s tests to evaluate differences in correct response rates for primary diagnoses across Conditions 1, 2, and 3. Results The correct primary diagnoses rates were 62/322 (19.3%), 178/322 (55.3%), and 93/322 (28.8%) for Conditions 1, 2, and 3, respectively. Additionally, Claude 3 Opus accurately provided the correct answer as a differential diagnosis in up to 22/322 (6.8%) of cases. There were statistically significant differences in correct response rates for primary diagnoses between all combinations of Conditions 1, 2, and 3 (p<0.001). Conclusion Claude 3 Opus demonstrated significantly improved diagnostic performance by inputting key images in addition to clinical history. The ability to list important differential diagnoses was also confirmed. Key Results This study investigated Claude 3 Opus’s performance in Radiology Diagnosis Please Cases using clinical history, key images, and imaging findings. Key images or imaging findings inputs significantly improved correct primary diagnoses from 19.3% to 28.8% or 55.5%, respectively. By having two additional differential diagnoses presented, total correct responses improved by 3.1–6.8%. Summary statement Large language AI model Claude 3 Opus demonstrated significantly improved diagnostic accuracy by adding key images with clinical history compared with clinical history alone.

openRxiv

Ryo Kurokawa Yuji Ohizumi Jun Kanzawa Mariko Kurokawa Takao Kiguchi Wataru Gonoi Osamu Abe

2024

Title: Diagnostic Performance of Claude 3 from Patient History and Key Images in Diagnosis Please Cases

Description:

Abstract Backgrounds Large language artificial intelligence models have showed its diagnostic performance based solely on textual information from clinical history and imaging findings.

However, the extent of their performance when utilizing radiological images and providing differential diagnoses has yet to be investigated.

Purpose We employed the latest version of Claude 3, Opus, released on March 4, 2024, to investigate its diagnostic performance in answering Radiology’s Diagnosis Please quiz questions under three conditions: (1) when provided with clinical history alone; (2) when given clinical history along with imaging findings; and (3) when supplied with clinical history and key images.

Furthermore, we evaluated the diagnostic performance of the model when instructed to list differential diagnoses.

Materials and Methods Claude 3 Opus was tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology’s “Diagnosis Please” cases, which included cases 1 to 322, published from 1998 to 2023.

The analyses were carried out under the following input conditions: Condition 1: Submitter-provided clinical history (text) alone Condition 2: Submitter-provided clinical history and imaging findings (text) Condition 3: Submitter-provided clinical history (text) and key images (PDF files) We applied McNemar’s tests to evaluate differences in correct response rates for primary diagnoses across Conditions 1, 2, and 3.

Results The correct primary diagnoses rates were 62/322 (19.

3%), 178/322 (55.

3%), and 93/322 (28.

8%) for Conditions 1, 2, and 3, respectively.

Additionally, Claude 3 Opus accurately provided the correct answer as a differential diagnosis in up to 22/322 (6.

8%) of cases.

There were statistically significant differences in correct response rates for primary diagnoses between all combinations of Conditions 1, 2, and 3 (p<0.

001).

Conclusion Claude 3 Opus demonstrated significantly improved diagnostic performance by inputting key images in addition to clinical history.

The ability to list important differential diagnoses was also confirmed.

Key Results This study investigated Claude 3 Opus’s performance in Radiology Diagnosis Please Cases using clinical history, key images, and imaging findings.

Key images or imaging findings inputs significantly improved correct primary diagnoses from 19.

3% to 28.

8% or 55.

5%, respectively.

By having two additional differential diagnoses presented, total correct responses improved by 3.

1–6.

8%.

Summary statement Large language AI model Claude 3 Opus demonstrated significantly improved diagnostic accuracy by adding key images with clinical history compared with clinical history alone.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Autonomy on Trial

Photo by CHUTTERSNAP on Unsplash Abstract This paper critically examines how US bioethics and health law conceptualize patient autonomy, contrasting the rights-based, individualist...

Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS

Abstract Background Diagnosis of neurogenic thoracic outlet syndrome (nTOS) is hindered by symptom overlap with cervical radiculopathy, carpal tunnel syndrome, or psychosomatic dis...

Complex Collision Tumors: A Systematic Review

Abstract Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...

Provocative Tests in Diagnosis of Thoracic Outlet Syndrome: A Narrative Review

Abstract Thoracic outlet syndrome (TOS) is a group of conditions caused by the compression of the neurovascular bundle within the thoracic outlet. It is classified into three main ...

Hydatid Disease of The Brain Parenchyma: A Systematic Review

Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...

Emerging Evidence of IgG4-Related Disease in Pericarditis: A Systematic Review

Abstract Introduction Immunoglobulin G4-related disease (IgG4-RD) is a recently identified immune-mediated condition that is debilitating and often overlooked. While IgG4-RD has be...

Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review

Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...

Email:
Password:

Email:

Diagnostic Performance of Claude 3 from Patient History and Key Images in Diagnosis Please Cases

Related Results