Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports

View through CrossRef
Abstract To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports. Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.5, ChatGPT-4.0 and Copilot, with each question and report being posed three times to them at different times. The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared. In answering twenty questions, both ChatGPT-3.5 and ChatGPT-4.0 outperformed Copilot in accuracy (95.0% vs. 80.0%) and consistency (90.0% and 85.0% vs. 75.0%). However, no statistical difference was found among them. When analyzing obstetric ultrasound reports, ChatGPT-3.5 and ChatGPT-4.0 demonstrated superior accuracy compared to Copilot (P < 0.05), and all three showed high consistency and the ability to provide recommendations. The overall accuracy and consistency of ChatGPT-3.5, ChatGPT-4.0, and Copilot were 83.86%, 84.13% vs. 77.51% in accuracy, and 87.30%, 93.65% vs. 90.48% in consistency, respectively. These large language models (ChatGPT-3.5, ChatGPT-4.0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues. With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.
Title: Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Description:
Abstract To evaluate and compare the performance of publicly available ChatGPT-3.
5, ChatGPT-4.
0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports.
Twenty questions related to obstetric ultrasound were answered and 110 obstetric ultrasound reports were analyzed by ChatGPT-3.
5, ChatGPT-4.
0 and Copilot, with each question and report being posed three times to them at different times.
The accuracy and consistency of each response to twenty questions and each analysis result in the report were evaluated and compared.
In answering twenty questions, both ChatGPT-3.
5 and ChatGPT-4.
0 outperformed Copilot in accuracy (95.
0% vs.
80.
0%) and consistency (90.
0% and 85.
0% vs.
75.
0%).
However, no statistical difference was found among them.
When analyzing obstetric ultrasound reports, ChatGPT-3.
5 and ChatGPT-4.
0 demonstrated superior accuracy compared to Copilot (P < 0.
05), and all three showed high consistency and the ability to provide recommendations.
The overall accuracy and consistency of ChatGPT-3.
5, ChatGPT-4.
0, and Copilot were 83.
86%, 84.
13% vs.
77.
51% in accuracy, and 87.
30%, 93.
65% vs.
90.
48% in consistency, respectively.
These large language models (ChatGPT-3.
5, ChatGPT-4.
0 and Copilot) have the potential to assist clinical workflows by enhancing patient education and patient clinical communication around common obstetric ultrasound issues.
With inconsistent and sometimes inaccurate responses, along with cybersecurity concerns, physician supervision is crucial in the use of these models.

Related Results

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compa...
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Abstract Introduction Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemi...

Back to Top