Javascript must be enabled to continue!

Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence

Abstract Purpose The use of large language models (LLMs) in generative artificial intelligence (AI) is rapidly increasing in dentistry. However, their reliability is yet to be fully founded. This study aims to evaluate the diagnostic accuracy, clinical applicability, and patient education potential of LLMs in paediatric dentistry, by evaluating the responses of six LLMs: Google AI’s Gemini and Gemini Advanced, OpenAI’s ChatGPT-3.5, -4o and -4, and Microsoft’s Copilot. Methods Ten open-type clinical questions, relevant to paediatric dentistry were posed to the LLMs. The responses were graded by two independent evaluators from 0 to 10 using a detailed rubric. After 4 weeks, answers were reevaluated to assess intra-evaluator reliability. Statistical comparisons used Friedman’s and Wilcoxon’s and Kruskal–Wallis tests to assess the model that provided the most comprehensive, accurate, explicit and relevant answers. Results Variations of results were noted. Chat GPT 4 answers were scored as the best (average score 8.08), followed by the answers of Gemini Advanced (8.06), ChatGPT 4o (8.01), ChatGPT 3.5 (7.61), Gemini (7,32) and Copilot (5.41). Statistical analysis revealed that Chat GPT 4 outperformed all other LLMs, and the difference was statistically significant. Despite variations and different responses to the same queries, remarkable similarities were observed. Except for Copilot, all chatbots managed to achieve a score level above 6.5 on all queries. Conclusion This study demonstrates the potential use of language models (LLMs) in supporting evidence-based paediatric dentistry. Nevertheless, they cannot be regarded as completely trustworthy. Dental professionals should critically use AI models as supportive tools and not as a substitute of overall scientific knowledge and critical thinking.

Springer Science and Business Media LLC

Anastasia Dermata Aristidis Arhakis Miltiadis A. Makrygiannakis Kostis Giannakopoulos Eleftherios G. Kaklamanos

European Archives of Paediatric Dentistry

2025

Title: Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence

Description:

Abstract Purpose The use of large language models (LLMs) in generative artificial intelligence (AI) is rapidly increasing in dentistry.

However, their reliability is yet to be fully founded.

This study aims to evaluate the diagnostic accuracy, clinical applicability, and patient education potential of LLMs in paediatric dentistry, by evaluating the responses of six LLMs: Google AI’s Gemini and Gemini Advanced, OpenAI’s ChatGPT-3.

5, -4o and -4, and Microsoft’s Copilot.

Methods Ten open-type clinical questions, relevant to paediatric dentistry were posed to the LLMs.

The responses were graded by two independent evaluators from 0 to 10 using a detailed rubric.

After 4 weeks, answers were reevaluated to assess intra-evaluator reliability.

Statistical comparisons used Friedman’s and Wilcoxon’s and Kruskal–Wallis tests to assess the model that provided the most comprehensive, accurate, explicit and relevant answers.

Results Variations of results were noted.

Chat GPT 4 answers were scored as the best (average score 8.

08), followed by the answers of Gemini Advanced (8.

06), ChatGPT 4o (8.

01), ChatGPT 3.

5 (7.

61), Gemini (7,32) and Copilot (5.

41).

Statistical analysis revealed that Chat GPT 4 outperformed all other LLMs, and the difference was statistically significant.

Despite variations and different responses to the same queries, remarkable similarities were observed.

Except for Copilot, all chatbots managed to achieve a score level above 6.

5 on all queries.

Conclusion This study demonstrates the potential use of language models (LLMs) in supporting evidence-based paediatric dentistry.

Nevertheless, they cannot be regarded as completely trustworthy.

Dental professionals should critically use AI models as supportive tools and not as a substitute of overall scientific knowledge and critical thinking.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Primerjalna književnost na prelomu tisočletja

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

OA27 Growth of the UK and Ireland paediatric rheumatology nurses’ group

Abstract Introduction/Background The Paediatric Rheumatology Clinical Nurse Specialist often has to manage a large caseload of c...

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review

This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...

Post-Pandemic Support for Special Populations in Higher Education through Generative Artificial Intelligence

The sudden closure of schools in response to the COVID-19 pandemic prompted education authorities to quickly explore new teaching and learning methods. This disruption to tradition...

Paediatric dentistry undergraduate education across dental schools in the Arabian region: a cross-sectional study

Abstract Purpose To assess and compare teaching of paediatric dentistry in the undergraduate curriculum among dental schools in the Arabian region. ...

Email:
Password:

Email:

Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence

Related Results