Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions

View through CrossRef
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.
Title: A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions
Description:
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models.
This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.
S.
Centers for Disease Control and Prevention (CDC) as the evaluation criteria.
The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards.
Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability.
Then the medical topics of the generated text are analyzed using text analysis technology.
Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring.
Accuracy: domestic models have higher textual accuracy, while international models have higher reliability.
Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency.
Comprehensibility: domestic models offered superior readability, while international models were more stable in output.
Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models.
Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine.
From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper.
Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations.
In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity.
The inclusion of manual evaluations confirms the reliability of the proposed assessment framework.
It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.

Related Results

Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
(021) ChatGPT's Ability to Assess Quality and Readability of Online Medical Information
(021) ChatGPT's Ability to Assess Quality and Readability of Online Medical Information
Abstract Introduction Health literacy plays a crucial role in enabling patients to understand and effectively use medical inform...
Comparative Analysis of Reading Text Readability in Chinese Junior High School English Teaching Textbook Based on Corpus
Comparative Analysis of Reading Text Readability in Chinese Junior High School English Teaching Textbook Based on Corpus
Readability is one of the important factors for students to consider when choosing a text to read. Based on a self-built corpus of Junior High School textbooks in China, this paper...

Back to Top