Javascript must be enabled to continue!

Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions. Results ChatGPT’s performance for MCQ questions scored on average 64.39%. ChatGPT’s strongest metric performance for MCQ was precision (68.76%). ChatGPT performed best at answering Pathology questions (Grubbs test p < .05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT chose option 1 more than other options (p = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four. Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.

Springer Science and Business Media LLC

Anna Maino Jakub Klikowski Brendan Strong Wahid Ghaffari Michał Woźniak Tristan BOURCIER Andrzej Grzybowski

2024

Title: Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

Description:

Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations.

This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions.

Results ChatGPT’s performance for MCQ questions scored on average 64.

39%.

ChatGPT’s strongest metric performance for MCQ was precision (68.

76%).

ChatGPT performed best at answering Pathology questions (Grubbs test p < .

05).

Optics and refraction had the lowest-scoring MCQ performance across all metrics.

ChatGPT’s SBA performance averaged 28.

43%, with the highest score and strongest performance in precision (29.

36%).

Pathology SBA questions were consistently the lowest-scoring topic across most metrics.

ChatGPT chose option 1 more than other options (p = 0.

19).

When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.

Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances.

Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four.

Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Analisis Penggunaan Variasi Turbo Cyclone Terhadap Performa Kendaraan

Penelitian ini dilatar belakangi banyaknya kendaraan dengan usia pakai dan pola perawatan yang tidak rutin yang berakibat turunannya performa dan emisi yang meningkat. Penelitian i...

Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model

Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...

ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research

Dear Editors, Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...

Factors Influencing Choice of Medical Specialty among Ophthalmology and Non-Ophthalmology Residency Applicants

AbstractObjective The study aimed to investigate factors influencing choice of specialty among ophthalmology and non-ophthalmology residency applicants.Patients and Methods Anonymo...

Assessment of Artificial Intelligence Chatbot Performance on the Canadian Otolaryngology and Head and Neck Surgery In-Training Exam: Insights from a Comparative Analysis (Preprint)

BACKGROUND The introduction of large language models (LLM) has rapidly transformed the field of healthcare. Its performance, often compared to that of physi...

Appearance of ChatGPT and English Study

The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...

Email:
Password:

Email:

Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

Related Results