Javascript must be enabled to continue!
ChatGPT-4 Prompt: A Tool to Enhance Novice Radiologists' Diagnostic Capabilities in Cystic Renal Masses to Expert-Level Accuracy
View through CrossRef
Abstract
Background
The impact of prompt engineering in LLMs on text-based questions has shown variability, whereas its influence on image-based diagnostic tasks remains largely unexplored.
Purpose
This study aims to evaluate the diagnostic performance of various prompts in GPT-4 for the assessment of renal cystic masses (CRMs) using contrast-enhanced ultrasound (CEUS)Bosniak classification. And then test the ability of ChatGPT-4 prompts to assist radiologists with different experience.
Materials and Methods
This retrospective study included 103 images of CRMs from patients who underwent CEUS and CT. GPT-4 (OpenAI) and six radiologists (three experts and three novices) were independently tasked with assigning the Bosniak classification (BC) based solely on the original CEUS images. Subsequently,radiologists reassessed these images after knowing the BCs generated by GPT-4's prompt and determined whether to modify their initial assessments. The diagnostic performance of radiologists and GPT-4 prompts was assessed and quantified using the area under the receiver operating characteristic curve (AUC).
Result
The AUC achieved by GPT-4 prompts ranged from 0.549 to 0.778, while radiologists' AUCs ranged from 0.820 to 0.901. Among all prompting strategies, ROT prompting achieved the highest AUC, demonstrating performance comparable to that of novices (0.778 vs. 0.820, P = 0.39). Although the AUC was lower than that of experts (0.778 vs. 0.901, P = 0.01), ROT prompting improved the AUCs of novices: from 0.714 to 0.834 for novice 1, from 0.685 to 0.782 for novice 2, and from 0.704 to 0.783 for novice 3, with all three novices approaching expert-level performance.
Conclusion
GPT-4 with different prompts showed variable performance in interpreting images. ROT prompting as the best-performing style achieved diagnostic accuracy comparable to novices, and it could aidnovices in improving their diagnostic performance to expert level.
Springer Science and Business Media LLC
Title: ChatGPT-4 Prompt: A Tool to Enhance Novice Radiologists' Diagnostic Capabilities in Cystic Renal Masses to Expert-Level Accuracy
Description:
Abstract
Background
The impact of prompt engineering in LLMs on text-based questions has shown variability, whereas its influence on image-based diagnostic tasks remains largely unexplored.
Purpose
This study aims to evaluate the diagnostic performance of various prompts in GPT-4 for the assessment of renal cystic masses (CRMs) using contrast-enhanced ultrasound (CEUS)Bosniak classification.
And then test the ability of ChatGPT-4 prompts to assist radiologists with different experience.
Materials and Methods
This retrospective study included 103 images of CRMs from patients who underwent CEUS and CT.
GPT-4 (OpenAI) and six radiologists (three experts and three novices) were independently tasked with assigning the Bosniak classification (BC) based solely on the original CEUS images.
Subsequently,radiologists reassessed these images after knowing the BCs generated by GPT-4's prompt and determined whether to modify their initial assessments.
The diagnostic performance of radiologists and GPT-4 prompts was assessed and quantified using the area under the receiver operating characteristic curve (AUC).
Result
The AUC achieved by GPT-4 prompts ranged from 0.
549 to 0.
778, while radiologists' AUCs ranged from 0.
820 to 0.
901.
Among all prompting strategies, ROT prompting achieved the highest AUC, demonstrating performance comparable to that of novices (0.
778 vs.
0.
820, P = 0.
39).
Although the AUC was lower than that of experts (0.
778 vs.
0.
901, P = 0.
01), ROT prompting improved the AUCs of novices: from 0.
714 to 0.
834 for novice 1, from 0.
685 to 0.
782 for novice 2, and from 0.
704 to 0.
783 for novice 3, with all three novices approaching expert-level performance.
Conclusion
GPT-4 with different prompts showed variable performance in interpreting images.
ROT prompting as the best-performing style achieved diagnostic accuracy comparable to novices, and it could aidnovices in improving their diagnostic performance to expert level.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
AI and Incidental Findings
AI and Incidental Findings
Photo by Accuray on Unsplash
INTRODUCTION
Delayed and missed follow-up on incidental findings threatens patient health and is a major financial risk for healthcare systems. The hea...
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
Dear Editors,
Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...

