Javascript must be enabled to continue!
Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
View through CrossRef
Abstract
Introduction
Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemini, Microsoft Copilot, and Llama 2—in a comprehensive 100-question nursing competency examination. The objective is to gauge their potential contributions to nursing healthcare education and future potential implications.
Methods
The study tested four AI systems (ChatGPT 3.5, Gemini, Microsoft Copilot, Llama 2) with a 100-question nursing exam in February of 2024. A standardized protocol was employed to administer the examination, covering diverse nursing competencies. Questions derived from reputable clinical manuals ensured content reliability. The AI systems underwent evaluation based on accuracy rates.
Results
Microsoft Copilot demonstrated the highest accuracy at 84%, followed by ChatGPT 3.5 (77%), Gemini (75%), and Llama 2 (68%). None achieved complete accuracy on all questions. Each of the AI systems has answered at least one question that only they got correctly.
Conclusion
The variations in AI answers underscore the significance of selecting appropriate AI systems based on specific application requirements and domains, as no singular AI system consistently surpassed others in every aspect of nursing knowledge.
Title: Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study
Description:
Abstract
Introduction
Artificial intelligence (AI) has emerged as a transformative force in healthcare.
This study assesses the performance of advanced AI systems—ChatGPT-3.
5, Gemini, Microsoft Copilot, and Llama 2—in a comprehensive 100-question nursing competency examination.
The objective is to gauge their potential contributions to nursing healthcare education and future potential implications.
Methods
The study tested four AI systems (ChatGPT 3.
5, Gemini, Microsoft Copilot, Llama 2) with a 100-question nursing exam in February of 2024.
A standardized protocol was employed to administer the examination, covering diverse nursing competencies.
Questions derived from reputable clinical manuals ensured content reliability.
The AI systems underwent evaluation based on accuracy rates.
Results
Microsoft Copilot demonstrated the highest accuracy at 84%, followed by ChatGPT 3.
5 (77%), Gemini (75%), and Llama 2 (68%).
None achieved complete accuracy on all questions.
Each of the AI systems has answered at least one question that only they got correctly.
Conclusion
The variations in AI answers underscore the significance of selecting appropriate AI systems based on specific application requirements and domains, as no singular AI system consistently surpassed others in every aspect of nursing knowledge.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports
Abstract
To evaluate and compare the performance of publicly available ChatGPT-3.5, ChatGPT-4.0 and Microsoft Copilot in Bing (Copilot) in answering obstetric ultrasound ...
Performance of
AI
‐Chatbots to Common Temporomandibular Joint Disorders (
TMDs
) Patient Queries: Accuracy, Completeness, Reliability and Readability
Performance of
AI
‐Chatbots to Common Temporomandibular Joint Disorders (
TMDs
) Patient Queries: Accuracy, Completeness, Reliability and Readability
ABSTRACT
TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. Thi...
How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
How AI Responds to Obstetric Ultrasound Questions and Analyzes and Explains Obstetric Ultrasound Reports: ChatGPT-3.5 vs. Microsoft Copilot in Bing
Abstract
Objectives: To evaluate and compare the accuracy and consistency of answers to obstetric ultrasound questions and analysis of obstetric ultrasound reports using pu...
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency
Abstract
Background
We compared the performance, confidence, and response consistency of five chatbots powered by large language models in solvin...
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
Dear Editors,
Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...
Assessment of Plagiarism in AI-Generated Responses to Gynecologic Oncology-Related Queries
Assessment of Plagiarism in AI-Generated Responses to Gynecologic Oncology-Related Queries
Originality and attribution of narrative responses in healthcare remain underexamined. Plagiarism carries significant ethical, legal, and professional consequences, undermining tru...

