Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

View through CrossRef
Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions. Results ChatGPT’s performance for MCQ questions scored on average 64.39%. ChatGPT’s strongest metric performance for MCQ was precision (68.76%). ChatGPT performed best at answering Pathology questions (Grubbs test p < .05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT chose option 1 more than other options (p = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four. Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.
Title: Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Description:
Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results.
Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations.
This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions.
Results ChatGPT’s performance for MCQ questions scored on average 64.
39%.
ChatGPT’s strongest metric performance for MCQ was precision (68.
76%).
ChatGPT performed best at answering Pathology questions (Grubbs test p < .
05).
Optics and refraction had the lowest-scoring MCQ performance across all metrics.
ChatGPT’s SBA performance averaged 28.
43%, with the highest score and strongest performance in precision (29.
36%).
Pathology SBA questions were consistently the lowest-scoring topic across most metrics.
ChatGPT chose option 1 more than other options (p = 0.
19).
When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.
Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances.
Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four.
Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.

Related Results

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Analisis Penggunaan Variasi Turbo Cyclone Terhadap Performa Kendaraan
Analisis Penggunaan Variasi Turbo Cyclone Terhadap Performa Kendaraan
Penelitian ini dilatar belakangi banyaknya kendaraan dengan usia pakai dan pola perawatan yang tidak rutin yang berakibat turunannya performa dan emisi yang meningkat. Penelitian i...
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
Dear Editors, Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...
Factors Influencing Choice of Medical Specialty among Ophthalmology and Non-Ophthalmology Residency Applicants
Factors Influencing Choice of Medical Specialty among Ophthalmology and Non-Ophthalmology Residency Applicants
AbstractObjective The study aimed to investigate factors influencing choice of specialty among ophthalmology and non-ophthalmology residency applicants.Patients and Methods Anonymo...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...

Back to Top