Javascript must be enabled to continue!
How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
View through CrossRef
Abstract
Objectives: To evaluate and compare the accuracy and consistency of answers to obstetric ultrasound questions and analysis of obstetric ultrasound reports using publicly available ChatGPT-3.5 and Microsoft Copilot in Bing (Copilot).
Methods: Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by both ChatGPT-3.5 and Copilot, with each question and report being posed three times to them at different times. The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared.
Results: In answering twenty questions, ChatGPT-3.5 outperformed Copilot in both accuracy (95.0% vs. 80.0%) and consistency (90.0% vs. 75.0%). When analyzing obstetric ultrasound reports, two models performed similarly in accuracy and consistency, and can provide recommendations. The overall accuracy and consistency of ChatGPT-3.5 and Copilot were 83.86%, 87.30% vs 77.51%, 90.48%, respectively. However, in detecting abnormal amniotic fluid index, ChatGPT-3.5 was superior to Copilot (accuracy 87.50% vs. 66.67%, P < 0.05).
Conclusion: While ChatGPT-3.5 and Copilot can provide valuable explanations on obstetric ultrasound and interpret most obstetric ultrasound reports accurately, neither model consistently answered all questions correctly or with complete consistency, the supervision of physician is crucial in the use of these models.
Springer Science and Business Media LLC
Title: How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
Description:
Abstract
Objectives: To evaluate and compare the accuracy and consistency of answers to obstetric ultrasound questions and analysis of obstetric ultrasound reports using publicly available ChatGPT-3.
5 and Microsoft Copilot in Bing (Copilot).
Methods: Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by both ChatGPT-3.
5 and Copilot, with each question and report being posed three times to them at different times.
The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared.
Results: In answering twenty questions, ChatGPT-3.
5 outperformed Copilot in both accuracy (95.
0% vs.
80.
0%) and consistency (90.
0% vs.
75.
0%).
When analyzing obstetric ultrasound reports, two models performed similarly in accuracy and consistency, and can provide recommendations.
The overall accuracy and consistency of ChatGPT-3.
5 and Copilot were 83.
86%, 87.
30% vs 77.
51%, 90.
48%, respectively.
However, in detecting abnormal amniotic fluid index, ChatGPT-3.
5 was superior to Copilot (accuracy 87.
50% vs.
66.
67%, P < 0.
05).
Conclusion: While ChatGPT-3.
5 and Copilot can provide valuable explanations on obstetric ultrasound and interpret most obstetric ultrasound reports accurately, neither model consistently answered all questions correctly or with complete consistency, the supervision of physician is crucial in the use of these models.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Abstract
To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound ...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Abstract
Background
We compared the performance, confidence, and response consistency of five chatbots powered by large language models in solvin...
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Abstract
Introduction
Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemi...
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...

