Javascript must be enabled to continue!

Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports

Abstract To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports. Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.5, ChatGPT-4.0 and Copilot, with each question and report being posed three times to them at different times. The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared. In answering twenty questions, both ChatGPT-3.5 and ChatGPT-4.0 outperformed Copilot in accuracy (95.0% vs. 80.0%) and consistency (90.0% and 85.0% vs. 75.0%). However, no statistical difference was found among them. When analyzing obstetric ultrasound reports, ChatGPT-3.5 and ChatGPT-4.0 demonstrated superior accuracy compared to Copilot (P < 0.05), and all three showed high consistency and the ability to provide recommendations. The overall accuracy and consistency of ChatGPT-3.5, ChatGPT-4.0, and Copilot were 83.86%, 84.13% vs. 77.51% in accuracy, and 87.30%, 93.65% vs. 90.48% in consistency, respectively. These large language models (ChatGPT-3.5, ChatGPT-4.0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues. With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.

Springer Science and Business Media LLC

Yanran Du Chao Ji Jiale Xu Minyan Wei Yunyun Ren Shujun Xia JianQiao Zhou

Scientific Reports

2025

Title: Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports

Description:

Abstract To evaluate and compare the performance of publicly available ChatGPT-3.

5, ChatGPT-4.

0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports.

Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.

5, ChatGPT-4.

0 and Copilot, with each question and report being posed three times to them at different times.

The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared.

In answering twenty questions, both ChatGPT-3.

5 and ChatGPT-4.

0 outperformed Copilot in accuracy (95.

0% vs.

80.

0%) and consistency (90.

0% and 85.

0% vs.

75.

0%).

However, no statistical difference was found among them.

When analyzing obstetric ultrasound reports, ChatGPT-3.

5 and ChatGPT-4.

0 demonstrated superior accuracy compared to Copilot (P < 0.

05), and all three showed high consistency and the ability to provide recommendations.

The overall accuracy and consistency of ChatGPT-3.

5, ChatGPT-4.

0, and Copilot were 83.

86%, 84.

13% vs.

77.

51% in accuracy, and 87.

30%, 93.

65% vs.

90.

48% in consistency, respectively.

These large language models (ChatGPT-3.

5, ChatGPT-4.

0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues.

With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing

Abstract Objectives: To evaluate and compare the accuracy and consistency of answers to obstetric ultrasound questions and analysis of obstetric ultrasound reports using pu...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency

Abstract Background We compared the performance, confidence, and response consistency of five chatbots powered by large language models in solvin...

Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compa...

CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS

This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...

P-525 ChatGPT 4.0: accurate, clear, relevant, and readable responses to frequently asked fertility patient questions

Abstract Study question What is the accuracy, clarity, relevance and readability of ChatGPT’s responses to frequently asked fert...

Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study

Abstract Introduction Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemi...

Email:
Password:

Email:

Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports

Related Results