Javascript must be enabled to continue!
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
View through CrossRef
Abstract
To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports. Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.5, ChatGPT-4.0 and Copilot, with each question and report being posed three times to them at different times. The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared. In answering twenty questions, both ChatGPT-3.5 and ChatGPT-4.0 outperformed Copilot in accuracy (95.0% vs. 80.0%) and consistency (90.0% and 85.0% vs. 75.0%). However, no statistical difference was found among them. When analyzing obstetric ultrasound reports, ChatGPT-3.5 and ChatGPT-4.0 demonstrated superior accuracy compared to Copilot (P < 0.05), and all three showed high consistency and the ability to provide recommendations. The overall accuracy and consistency of ChatGPT-3.5, ChatGPT-4.0, and Copilot were 83.86%, 84.13% vs. 77.51% in accuracy, and 87.30%, 93.65% vs. 90.48% in consistency, respectively. These large language models (ChatGPT-3.5, ChatGPT-4.0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues. With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.
Springer Science and Business Media LLC
Title: Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Description:
Abstract
To evaluate and compare the performance of publicly available ChatGPT-3.
5, ChatGPT-4.
0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports.
Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.
5, ChatGPT-4.
0 and Copilot, with each question and report being posed three times to them at different times.
The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared.
In answering twenty questions, both ChatGPT-3.
5 and ChatGPT-4.
0 outperformed Copilot in accuracy (95.
0% vs.
80.
0%) and consistency (90.
0% and 85.
0% vs.
75.
0%).
However, no statistical difference was found among them.
When analyzing obstetric ultrasound reports, ChatGPT-3.
5 and ChatGPT-4.
0 demonstrated superior accuracy compared to Copilot (P < 0.
05), and all three showed high consistency and the ability to provide recommendations.
The overall accuracy and consistency of ChatGPT-3.
5, ChatGPT-4.
0, and Copilot were 83.
86%, 84.
13% vs.
77.
51% in accuracy, and 87.
30%, 93.
65% vs.
90.
48% in consistency, respectively.
These large language models (ChatGPT-3.
5, ChatGPT-4.
0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues.
With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
Abstract
Objectives: To evaluate and compare the accuracy and consistency of answers to obstetric ultrasound questions and analysis of obstetric ultrasound reports using pu...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Abstract
Background
We compared the performance, confidence, and response consistency of five chatbots powered by large language models in solvin...
Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Abstract
Background/Objectives:
This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compa...
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions
Abstract
Study question
What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Abstract
Introduction
Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemi...


