Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam

View through CrossRef
Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions. Results ChatGPT’s performance for MCQ questions scored on average 64.39%. ChatGPT’s strongest metric performance for MCQ was precision (68.76%). ChatGPT performed best at answering Pathology questions (Grubbs test p < .05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT chose option 1 more than other options (p = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four. Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.
Title: Comparison of ChatGPT 3.5 Turbo and Human Performance in taking the European Board of Ophthalmology Diploma (EBOD) Exam
Description:
Abstract Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results.
Methods This cross-sectional study used a sample of previous past exam papers from 2012, 2013, 2020–2023 EBOD examinations.
This study analysed ChatGPT’s responses to 392 Multiple Choice Questions (MCQ), each containing 5 true/false statements (1432 statements in total) and 48 Single Best Answer (SBA) questions.
Results ChatGPT’s performance for MCQ questions scored on average 64.
39%.
ChatGPT’s strongest metric performance for MCQ was precision (68.
76%).
ChatGPT performed best at answering Pathology questions (Grubbs test p < .
05).
Optics and refraction had the lowest-scoring MCQ performance across all metrics.
ChatGPT’s SBA performance averaged 28.
43%, with the highest score and strongest performance in precision (29.
36%).
Pathology SBA questions were consistently the lowest-scoring topic across most metrics.
ChatGPT chose option 1 more than other options (p = 0.
19).
When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.
Conclusion ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances.
Performance was poorer for SBA questions, especially as ChatGPT was more likely to choose the first answer out of four.
Our results suggest that ChatGPT’s ability in information retrieval is better than knowledge integration.

Related Results

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
CHATGPT ASSISTANCE ON BIOCHEMISTRY LEARNING OUTCOMES OF PRE-SERVICE TEACHERS
This research investigates the effect of ChatGPT on the learning outcomes of pre-service biology teachers. Sampling was done by purposive sampling in class A (treated with ChatGPT)...
Appearance of ChatGPT and English Study
Appearance of ChatGPT and English Study
The purpose of this study is to examine the definition and characteristics of ChatGPT in order to present the direction of self-directed learning to learners, and to explore the po...
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)
BACKGROUND With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...
Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing
Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing
OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams. MET...

Back to Top