Javascript must be enabled to continue!
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
View through CrossRef
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophisticated AI systems, namely, ChatGPT, Gemini, and Perplexity when applied to an examination focused on knowledge regarding research publication.
Methods
Three AI systems (ChatGPT-3.5, Gemini, and perplexity) were evaluated using an examination of fifty multiple-choice questions covering various aspects of research, including research terminology, literature review, study design, research writing, and publication-related topics. The questions were written by a researcher with an h-index of 22, and it was later tested on two other researchers with h-indices of 9 and 10 in a double-blinded manner and revised extensively to ensure the quality of the questions before testing them on the three mentioned AI systems.
Results
In the examination, ChatGPT scored 38 (76%) correct answers, while Gemini and Perplexity each scored 36 (72%). Notably, all AI systems frequently chose correct options significantly: ChatGPT chose option (C) correctly 88.9% of the time, Gemini accurately selected option (D) 78.9% of the time, and Perplexity correctly picked option (C) 88.9% of the time. In contrast, other AI tools showed minor agreement, lacking statistical significance, while ChatGPT exhibited significant concordance (81-83%) with researchers' performance.
Conclusion
ChatGPT, Gemini, and Perplexity perform adequately overall in research-related questions, but depending on the AI in use, improvement is needed in certain research categories. The involvement of an expert in the research publication process remains a fundamental cornerstone to ensure the quality of the work.
Introduction
The work of John McCarthy is the foundation of modern artificial intelligence (AI) research. In 1956, at Dartmouth College, he introduced the phrase "artificial intelligence," marking the inception of formal AI research [1]. The emergence of AI was an innovative technological frontier, promising transformative impacts across diverse sectors. Recent years have witnessed significant strides in the AI domain, particularly in the refinement of chatbot technology. An increasingly prevalent notion suggests that AI, having surpassed human capabilities in several domains, holds promise for substantial advancements in the realm of research publications. AI stands poised to augment research writing, the accuracy of information retrieved, and referencing, thereby potentially revolutionizing the field [2].
Over the past few years, a multitude of AI tools have become readily accessible, providing a diverse array of services and functionalities. A notable instance of such an AI system is ChatGPT, an advanced language model crafted by OpenAI. It underwent training using a vast array of textual materials gathered from websites, literature, and diverse sources, engaging in language modeling tasks to enhance its capabilities. This attribute sets it apart as one of the most expansive and resilient language models ever devised, integrating an astonishing 175 billion parameters [3,4]. An additional AI system that has attracted attention is Gemini, previously identified as Google Bard, which is an AI-driven information retrieval apparatus with a sophisticated chatbot that utilizes a "native multimodal" approach to effectively process and adjust to various types of data like video, audio, and text [5,6]. Perplexity AI stands as an AI-powered research and conversational search engine, adept at responding to queries through the utilization of natural language predictive text. It synthesizes answers from web sources, accompanied by citations through embedded links within the text response [7]. Many researchers are known to utilize chatbots as aids in their research endeavors.
This study seeks to assess and contrast the performance of sophisticated AI systems—namely, ChatGPT, Gemini, and Perplexity—when applied to an examination focused on knowledge regarding research publication. It also aims to shed light on the current state of AI integration within the research publication process and identify opportunities for further development.
Methods
In this comparative investigation, we evaluated the performance of three distinct AI systems: ChatGPT-3.5, Gemini, and Perplexity. The assessment comprised 50 multiple-choice questions, each offering four options (A-D). The questions spanned various domains including eleven research terminology queries, six literature review inquiries, twelve study design probes, twelve research writing assessments, and nine publication-related investigations.
Initially, a researcher with an h-index of 22, identified as the second author in the manuscript, composed a set of sixty multiple-choice questions. Subsequently, two other researchers with h-indices of 14 and 16, mentioned as authors seven and ten respectively, underwent the examination in a double-blinded fashion. Following this phase, all three researchers collaborated to review and analyze both questions and answers. Ten questions were excluded due to their lack of clarity, leaving a total of fifty questions selected for the final examination version. These selected questions were unanimously agreed upon by the researchers as informative indicators of knowledge within the realm of research and its associated intricacies.
The questions were then uniformly inputted into each of the AI systems in March 2024, following a standardized protocol. This protocol involved initiating interactions with the AI systems by introducing a prompt starting with "Hello." Subsequently, each AI system received the same directive: "Please select the correct answer for the following multiple-choice questions." The questions were directly transcribed from a prepared Word document, and the AI-generated responses were recorded in an Excel spreadsheet. Statistical analysis was performed using Statistical Package for the Social Sciences (SPSS) version 27.0, with a significance level set at p < 0.05. Chi-square (Fisher's Exact Test) was employed for data analysis.
During the literature review phase of the present study, papers were selectively included from reputable journals and omitted those published in predatory journals, adhering to the criteria delineated in Kscien’s list [8].
Results
In the examination, ChatGPT demonstrated slightly higher accuracy with a total of 38 correct answers (76%), compared to 36 correct answers (72%) by both Gemini and Perplexity. Notably, Researcher 2 excelled in terminology and literature review questions, with 15 correct answers (88.23%), surpassing ChatGPT and Gemini, with 13 correct answers (76.47%). In research writing, Perplexity, along with Researcher 1 and Researcher 2, led with 10 correct responses (83.3%). Additionally, Researcher 1 exhibited the highest accuracy in research publication, with 9 correct responses (100%), outperforming ChatGPT and Researcher 2, who achieved 7 correct responses (77.78%) (Supplementary 1).
In the examination comparing AI tools and two researchers' accuracy in identifying correct answers, researchers demonstrated superior accuracy compared to AI tools. For example, in questions where the correct answer was C, Researcher 2 achieved a perfect 100% accuracy, outperforming ChatGPT, Perplexity, and Gemini, which scored 88.9%, and 77.8% respectively. Notably, all AI systems significantly chosen the correct options. For instance, ChatGPT correctly identified option C 88.9% of the time, Gemini correctly chose option D 78.9% of the time, and Perplexity accurately selected option C 88.9% of the time (Table 1).
Table 1. The association between correct answers and AI tools
Correct
ChatGPT
A
B
C
D
Total
A
7 (63.6%)
0 (0.0%)
2 (18.2%)
2 (18.2%)
11 (100%)
B
0 (0.0%)
8 (72.7%)
2 (18.2%)
1 (9.1%)
11 (100%)
C
0 (0.0%)
0 (0.0%)
8 (88.9%)
1 (11.1%)
9 (100%)
D
0 (0.0%)
3 (15.8%)
1 (5.3%)
15 (78.9%)
19 (100%)
Total
7 (14%)
11 (22%)
13 (26%)
19 (38%)
50 (100%)
P-value
<0.001
Correct
Gemini
A
B
C
D
Total
A
7(63.6%)
2(18.2%)
1(9.1%)
1(9.1%)
11(100%)
B
1(9.1%)
7(63.6%)
2(18.2%)
1(9.1%)
11(100%)
C
0(0.0%)
0(0.0%)
7(77.8%)
2(22.2%)
9(100%)
D
2(10.5%)
2(10.5%)
0(0.0%)
15(78.9%)
19(100%)
Total
10(20%)
11(22%)
10(20%)
19(38%)
50(100%)
P-value
<0.001
Correct
Perplexity
A
B
C
D
Total
A
8(72.7%)
0(0.0%)
1(9.1%)
2(18.2%)
11(100%)
B
2(18.2%)
5(45.5%)
2(18.2%)
2(18.2%)
11(100%)
C
0 (0.0%)
0 (0.0%)
8 (88.9%)
1 (11.1%)
9 (100%)
D
0 (0.0%)
3 (15.8%)
1 (5.3%)
15 (78.9%)
19 (100%)
Total
10 (20%)
8 (16%)
12 (24%)
20 (40%)
50 (100%)
P-value
<0.001
Correct
Researcher 1
A
B
C
D
Total
A
10 (90.9%)
0 (0.0%)
0 (0.0%)
1 (9.1%)
11(100%)
B
0 (0.0%)
9 (81.8%)
0 (0.0%)
2 (18.2%)
11(100%)
C
0 (0.0%)
1 (11.1%)
8 (88.9%)
0 (0.0%)
9(100%)
D
0(0.0%)
2 (10.5%)
1 (5.3%)
16 (84.2%)
19 (100%)
Total
10 (20%)
12 (24%)
9 (18%)
19 (38%)
50 (100%)
P-value
<0.001
Correct
Researcher 2
A
B
C
D
Total
A
10 (90.9%)
0 (0.0%)
0 (0.0%)
1 (9.1%)
11 (100%)
B
1 (9.1%)
9 (81.8%)
1 (9.1%)
0 (0.0%)
11 (100%)
C
0 (0.0%)
0 (0.0%)
9 (100%)
0 (0.0%)
9 (100%)
D
2 (10.5%)
1 (5.3%)
3 (15.8%)
13 (68.4%)
19 (100%)
Total
13 (26%)
10 (20%)
13 (26%)
14 (28%)
50(100%)
P-value
<0.001
In comparing AI tools and researchers' performance, significant agreement was noted with ChatGPT. For instance, out of 43 questions where researcher 1 agreed on the correct answer, ChatGPT agreed in 35 cases (81.4%) and disagreed in only 8 answers (18.6%). However, the comparison with the other two AI tools showed no significance but a slight alignment with the researchers' agreement on the correct answers (Table 2).
Table 2. Comparative Analysis of AI Tools' and Researchers' Performance in Research Studies
AI tools
Researcher 1
P-value
Researcher 2
P-value
Agree
Disagree
Agree
Disagree
ChatGPT 3.5
0.048
0.027
Agree
35 (81.4%)
3 (42.9%)
34 (82.9%)
4 (44.4%)
Disagree
8 (18.6%)
4 (57.1%)
7 (17.1%)
5 (55.6%)
Total
43 (100%)
7 (100%)
41 (100%)
9 (100%)
Gemini
0.300
0.697
Agree
32 (74.4%)
4 (57.1%)
30 (73.2%)
6 (72%)
Disagree
11 (25.6%)
3 (42.9%)
11 (26.8%)
3 (33.3%)
Total
43 (100%)
7(100%)
41 (100%)
9 (100%)
Perplexity
0.085
0.094
Agree
33 (76.7%)
3 (42.9%)
32 (78%)
4 (44.4%)
Disagree
10 (23.3%)
4 (57.1%)
9 (22%)
5 (55.6%)
Total
43 (100%)
7 (100%)
41 (100%)
9 (100%)
*Fisher's Exact Test
Discussion
The imitation of human intelligence functions by machines, most commonly computer systems, is referred to as AI. It involves acquiring knowledge (gaining information and understanding rules for its utilization), logical deduction (applying rules to arrive at rough or precise outcomes), and self-adjustment. In addition, AI endeavors to develop systems capable of executing tasks traditionally associated with human intelligence, including decision-making, speech recognition, language translation, and visual perception, among various others [9]. Although AI language models have been in development for years, the general population's understanding of AI's potential and use has increased dramatically recently. The academic community has already embraced language-based AI, and numerous researchers utilize chatbots as aids in their research. These bots assist in structuring ideas, offering feedback on their work, and aiding in referencing and summarizing the existing research literature [2,10,11].
Kacena et al. demonstrated that the utilization of AI, particularly ChatGPT, reduced the time invested in crafting review articles. However, it yielded the highest similarity indices, indicating a greater probability of plagiarism. In addition, they reported that ChatGPT possesses the ability to swiftly scour the internet and evaluate potential sources, potentially accelerating the literature review process. In the current study, the performance of ChatGPT regarding the principle of literature review questions showed a high performance, and Gemini scored just as high, further supporting the finding of the previous study [12].
Salvagno et al. reported that AI may soon be leveraged for the automated production of figures, tables, and supplementary visual components within manuscripts. This utilization could facilitate data summarization and contribute to manuscript lucidity [13]. However, the current study demonstrated that the AI systems had different scores, and their performance was influenced by the different categories they were tested on, which means that identifying the strengths and weaknesses of the currently available AIs is paramount in choosing which AI system will aid in research publications rather than hindering and jeopardizing the integrity of the research paper, For instance, Kacena et al. showcased that 70% of the references were incorrect when an AI only method was applied to writing research papers, raising controversy if these AI tools should even be used as aid in that regard [12]. The present study showed that Gemini performed poorly by only getting half of the questions wrong in the research writing principles questions. In addition, Perplexity was shown to perform poorly on principles of publication-related questions, and ChatGPT exhibited subpar performance in research terminology inquiries, further supporting the notion that leveraging AI use is dependent on recognizing their limitations in the field of research.
Concerns about biases in AI systems, stemming from their training data, are widely recognized as a significant challenge. Research indicates that AI models can perpetuate biases and exhibit skewed behavior, replicating existing discriminatory patterns. Addressing these biases is crucial and requires the implementation of effective strategies prioritizing fairness and justice during development. This is particularly important in research, where ensuring impartiality is paramount. Responsible use of advanced language models like ChatGPT, Gemini, and Perplexity is essential, given the ethical dilemmas they pose, including the potential for misinformation and emotionally persuasive content. Proactive steps are needed to mitigate these risks and promote responsible usage. Additionally, the use of AI in content generation raises concerns about unintentional plagiarism, as systems may reproduce text without proper citation. While AI tools may increase publication output, there may not be a corresponding increase in expertise or experience among researchers [3,12].
Several studies have investigated the comparison of AI and human capabilities across various domains. Long et al. noted a remarkable level of accuracy in AI, ranging from 90% to 100% when evaluating its performance against specialized doctors' diagnostic and treatment decisions for congenital cataracts [14]. Additionally, Rajpurkar et al. discovered consistency in results between AI and radiologists, particularly in diagnosing chest radiographs [15]. However, there is limited available data on the comparison of AI and human performance in research principles. In this study, the comparison between AI tools and human performance regarding predetermined correct answers on research principles revealed a significant agreement (80-85%) between ChatGPT and researchers.
One of the limitations of our study is that we evaluated only three AI systems in comparison to the vast and increasing number of AI tools becoming available in these times. In addition, a larger number of questions will lead to a more comprehensive understanding of the strengths and weaknesses of these AI systems in the field of research and their utilities in that regard.
Conclusion
ChatGPT, Gemini, and Perplexity perform adequately overall in research-related questions, but depending on the AI in use, improvement is needed in certain research categories. The involvement of an expert in the research publication process remains a fundamental cornerstone to ensure the quality of the work.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: RQS and SHM were major contributors to the conception of the study and the literature search for related studies. AMS, JOA, DSH, and AMS were involved in the literature review, the study's design, and the critical revision of the manuscript, and they participated in data collection. HAH, and YMM were involved in the literature review, study design, and manuscript writing. BAA, DSH, and RQS Literature review, final approval of the manuscript, and processing of the tables. RQS and SHM confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Note applicable.
Acknowledgement: Not applicable.
Title: Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Description:
Abstract
Introduction
Many researchers utilize artificial intelligence (AI) to aid their research endeavors.
This study seeks to assess and contrast the performance of three sophisticated AI systems, namely, ChatGPT, Gemini, and Perplexity when applied to an examination focused on knowledge regarding research publication.
Methods
Three AI systems (ChatGPT-3.
5, Gemini, and perplexity) were evaluated using an examination of fifty multiple-choice questions covering various aspects of research, including research terminology, literature review, study design, research writing, and publication-related topics.
The questions were written by a researcher with an h-index of 22, and it was later tested on two other researchers with h-indices of 9 and 10 in a double-blinded manner and revised extensively to ensure the quality of the questions before testing them on the three mentioned AI systems.
Results
In the examination, ChatGPT scored 38 (76%) correct answers, while Gemini and Perplexity each scored 36 (72%).
Notably, all AI systems frequently chose correct options significantly: ChatGPT chose option (C) correctly 88.
9% of the time, Gemini accurately selected option (D) 78.
9% of the time, and Perplexity correctly picked option (C) 88.
9% of the time.
In contrast, other AI tools showed minor agreement, lacking statistical significance, while ChatGPT exhibited significant concordance (81-83%) with researchers' performance.
Conclusion
ChatGPT, Gemini, and Perplexity perform adequately overall in research-related questions, but depending on the AI in use, improvement is needed in certain research categories.
The involvement of an expert in the research publication process remains a fundamental cornerstone to ensure the quality of the work.
Introduction
The work of John McCarthy is the foundation of modern artificial intelligence (AI) research.
In 1956, at Dartmouth College, he introduced the phrase "artificial intelligence," marking the inception of formal AI research [1].
The emergence of AI was an innovative technological frontier, promising transformative impacts across diverse sectors.
Recent years have witnessed significant strides in the AI domain, particularly in the refinement of chatbot technology.
An increasingly prevalent notion suggests that AI, having surpassed human capabilities in several domains, holds promise for substantial advancements in the realm of research publications.
AI stands poised to augment research writing, the accuracy of information retrieved, and referencing, thereby potentially revolutionizing the field [2].
Over the past few years, a multitude of AI tools have become readily accessible, providing a diverse array of services and functionalities.
A notable instance of such an AI system is ChatGPT, an advanced language model crafted by OpenAI.
It underwent training using a vast array of textual materials gathered from websites, literature, and diverse sources, engaging in language modeling tasks to enhance its capabilities.
This attribute sets it apart as one of the most expansive and resilient language models ever devised, integrating an astonishing 175 billion parameters [3,4].
An additional AI system that has attracted attention is Gemini, previously identified as Google Bard, which is an AI-driven information retrieval apparatus with a sophisticated chatbot that utilizes a "native multimodal" approach to effectively process and adjust to various types of data like video, audio, and text [5,6].
Perplexity AI stands as an AI-powered research and conversational search engine, adept at responding to queries through the utilization of natural language predictive text.
It synthesizes answers from web sources, accompanied by citations through embedded links within the text response [7].
Many researchers are known to utilize chatbots as aids in their research endeavors.
This study seeks to assess and contrast the performance of sophisticated AI systems—namely, ChatGPT, Gemini, and Perplexity—when applied to an examination focused on knowledge regarding research publication.
It also aims to shed light on the current state of AI integration within the research publication process and identify opportunities for further development.
Methods
In this comparative investigation, we evaluated the performance of three distinct AI systems: ChatGPT-3.
5, Gemini, and Perplexity.
The assessment comprised 50 multiple-choice questions, each offering four options (A-D).
The questions spanned various domains including eleven research terminology queries, six literature review inquiries, twelve study design probes, twelve research writing assessments, and nine publication-related investigations.
Initially, a researcher with an h-index of 22, identified as the second author in the manuscript, composed a set of sixty multiple-choice questions.
Subsequently, two other researchers with h-indices of 14 and 16, mentioned as authors seven and ten respectively, underwent the examination in a double-blinded fashion.
Following this phase, all three researchers collaborated to review and analyze both questions and answers.
Ten questions were excluded due to their lack of clarity, leaving a total of fifty questions selected for the final examination version.
These selected questions were unanimously agreed upon by the researchers as informative indicators of knowledge within the realm of research and its associated intricacies.
The questions were then uniformly inputted into each of the AI systems in March 2024, following a standardized protocol.
This protocol involved initiating interactions with the AI systems by introducing a prompt starting with "Hello.
" Subsequently, each AI system received the same directive: "Please select the correct answer for the following multiple-choice questions.
" The questions were directly transcribed from a prepared Word document, and the AI-generated responses were recorded in an Excel spreadsheet.
Statistical analysis was performed using Statistical Package for the Social Sciences (SPSS) version 27.
0, with a significance level set at p < 0.
05.
Chi-square (Fisher's Exact Test) was employed for data analysis.
During the literature review phase of the present study, papers were selectively included from reputable journals and omitted those published in predatory journals, adhering to the criteria delineated in Kscien’s list [8].
Results
In the examination, ChatGPT demonstrated slightly higher accuracy with a total of 38 correct answers (76%), compared to 36 correct answers (72%) by both Gemini and Perplexity.
Notably, Researcher 2 excelled in terminology and literature review questions, with 15 correct answers (88.
23%), surpassing ChatGPT and Gemini, with 13 correct answers (76.
47%).
In research writing, Perplexity, along with Researcher 1 and Researcher 2, led with 10 correct responses (83.
3%).
Additionally, Researcher 1 exhibited the highest accuracy in research publication, with 9 correct responses (100%), outperforming ChatGPT and Researcher 2, who achieved 7 correct responses (77.
78%) (Supplementary 1).
In the examination comparing AI tools and two researchers' accuracy in identifying correct answers, researchers demonstrated superior accuracy compared to AI tools.
For example, in questions where the correct answer was C, Researcher 2 achieved a perfect 100% accuracy, outperforming ChatGPT, Perplexity, and Gemini, which scored 88.
9%, and 77.
8% respectively.
Notably, all AI systems significantly chosen the correct options.
For instance, ChatGPT correctly identified option C 88.
9% of the time, Gemini correctly chose option D 78.
9% of the time, and Perplexity accurately selected option C 88.
9% of the time (Table 1).
Table 1.
The association between correct answers and AI tools
Correct
ChatGPT
A
B
C
D
Total
A
7 (63.
6%)
0 (0.
0%)
2 (18.
2%)
2 (18.
2%)
11 (100%)
B
0 (0.
0%)
8 (72.
7%)
2 (18.
2%)
1 (9.
1%)
11 (100%)
C
0 (0.
0%)
0 (0.
0%)
8 (88.
9%)
1 (11.
1%)
9 (100%)
D
0 (0.
0%)
3 (15.
8%)
1 (5.
3%)
15 (78.
9%)
19 (100%)
Total
7 (14%)
11 (22%)
13 (26%)
19 (38%)
50 (100%)
P-value
<0.
001
Correct
Gemini
A
B
C
D
Total
A
7(63.
6%)
2(18.
2%)
1(9.
1%)
1(9.
1%)
11(100%)
B
1(9.
1%)
7(63.
6%)
2(18.
2%)
1(9.
1%)
11(100%)
C
0(0.
0%)
0(0.
0%)
7(77.
8%)
2(22.
2%)
9(100%)
D
2(10.
5%)
2(10.
5%)
0(0.
0%)
15(78.
9%)
19(100%)
Total
10(20%)
11(22%)
10(20%)
19(38%)
50(100%)
P-value
<0.
001
Correct
Perplexity
A
B
C
D
Total
A
8(72.
7%)
0(0.
0%)
1(9.
1%)
2(18.
2%)
11(100%)
B
2(18.
2%)
5(45.
5%)
2(18.
2%)
2(18.
2%)
11(100%)
C
0 (0.
0%)
0 (0.
0%)
8 (88.
9%)
1 (11.
1%)
9 (100%)
D
0 (0.
0%)
3 (15.
8%)
1 (5.
3%)
15 (78.
9%)
19 (100%)
Total
10 (20%)
8 (16%)
12 (24%)
20 (40%)
50 (100%)
P-value
<0.
001
Correct
Researcher 1
A
B
C
D
Total
A
10 (90.
9%)
0 (0.
0%)
0 (0.
0%)
1 (9.
1%)
11(100%)
B
0 (0.
0%)
9 (81.
8%)
0 (0.
0%)
2 (18.
2%)
11(100%)
C
0 (0.
0%)
1 (11.
1%)
8 (88.
9%)
0 (0.
0%)
9(100%)
D
0(0.
0%)
2 (10.
5%)
1 (5.
3%)
16 (84.
2%)
19 (100%)
Total
10 (20%)
12 (24%)
9 (18%)
19 (38%)
50 (100%)
P-value
<0.
001
Correct
Researcher 2
A
B
C
D
Total
A
10 (90.
9%)
0 (0.
0%)
0 (0.
0%)
1 (9.
1%)
11 (100%)
B
1 (9.
1%)
9 (81.
8%)
1 (9.
1%)
0 (0.
0%)
11 (100%)
C
0 (0.
0%)
0 (0.
0%)
9 (100%)
0 (0.
0%)
9 (100%)
D
2 (10.
5%)
1 (5.
3%)
3 (15.
8%)
13 (68.
4%)
19 (100%)
Total
13 (26%)
10 (20%)
13 (26%)
14 (28%)
50(100%)
P-value
<0.
001
In comparing AI tools and researchers' performance, significant agreement was noted with ChatGPT.
For instance, out of 43 questions where researcher 1 agreed on the correct answer, ChatGPT agreed in 35 cases (81.
4%) and disagreed in only 8 answers (18.
6%).
However, the comparison with the other two AI tools showed no significance but a slight alignment with the researchers' agreement on the correct answers (Table 2).
Table 2.
Comparative Analysis of AI Tools' and Researchers' Performance in Research Studies
AI tools
Researcher 1
P-value
Researcher 2
P-value
Agree
Disagree
Agree
Disagree
ChatGPT 3.
5
0.
048
0.
027
Agree
35 (81.
4%)
3 (42.
9%)
34 (82.
9%)
4 (44.
4%)
Disagree
8 (18.
6%)
4 (57.
1%)
7 (17.
1%)
5 (55.
6%)
Total
43 (100%)
7 (100%)
41 (100%)
9 (100%)
Gemini
0.
300
0.
697
Agree
32 (74.
4%)
4 (57.
1%)
30 (73.
2%)
6 (72%)
Disagree
11 (25.
6%)
3 (42.
9%)
11 (26.
8%)
3 (33.
3%)
Total
43 (100%)
7(100%)
41 (100%)
9 (100%)
Perplexity
0.
085
0.
094
Agree
33 (76.
7%)
3 (42.
9%)
32 (78%)
4 (44.
4%)
Disagree
10 (23.
3%)
4 (57.
1%)
9 (22%)
5 (55.
6%)
Total
43 (100%)
7 (100%)
41 (100%)
9 (100%)
*Fisher's Exact Test
Discussion
The imitation of human intelligence functions by machines, most commonly computer systems, is referred to as AI.
It involves acquiring knowledge (gaining information and understanding rules for its utilization), logical deduction (applying rules to arrive at rough or precise outcomes), and self-adjustment.
In addition, AI endeavors to develop systems capable of executing tasks traditionally associated with human intelligence, including decision-making, speech recognition, language translation, and visual perception, among various others [9].
Although AI language models have been in development for years, the general population's understanding of AI's potential and use has increased dramatically recently.
The academic community has already embraced language-based AI, and numerous researchers utilize chatbots as aids in their research.
These bots assist in structuring ideas, offering feedback on their work, and aiding in referencing and summarizing the existing research literature [2,10,11].
Kacena et al.
demonstrated that the utilization of AI, particularly ChatGPT, reduced the time invested in crafting review articles.
However, it yielded the highest similarity indices, indicating a greater probability of plagiarism.
In addition, they reported that ChatGPT possesses the ability to swiftly scour the internet and evaluate potential sources, potentially accelerating the literature review process.
In the current study, the performance of ChatGPT regarding the principle of literature review questions showed a high performance, and Gemini scored just as high, further supporting the finding of the previous study [12].
Salvagno et al.
reported that AI may soon be leveraged for the automated production of figures, tables, and supplementary visual components within manuscripts.
This utilization could facilitate data summarization and contribute to manuscript lucidity [13].
However, the current study demonstrated that the AI systems had different scores, and their performance was influenced by the different categories they were tested on, which means that identifying the strengths and weaknesses of the currently available AIs is paramount in choosing which AI system will aid in research publications rather than hindering and jeopardizing the integrity of the research paper, For instance, Kacena et al.
showcased that 70% of the references were incorrect when an AI only method was applied to writing research papers, raising controversy if these AI tools should even be used as aid in that regard [12].
The present study showed that Gemini performed poorly by only getting half of the questions wrong in the research writing principles questions.
In addition, Perplexity was shown to perform poorly on principles of publication-related questions, and ChatGPT exhibited subpar performance in research terminology inquiries, further supporting the notion that leveraging AI use is dependent on recognizing their limitations in the field of research.
Concerns about biases in AI systems, stemming from their training data, are widely recognized as a significant challenge.
Research indicates that AI models can perpetuate biases and exhibit skewed behavior, replicating existing discriminatory patterns.
Addressing these biases is crucial and requires the implementation of effective strategies prioritizing fairness and justice during development.
This is particularly important in research, where ensuring impartiality is paramount.
Responsible use of advanced language models like ChatGPT, Gemini, and Perplexity is essential, given the ethical dilemmas they pose, including the potential for misinformation and emotionally persuasive content.
Proactive steps are needed to mitigate these risks and promote responsible usage.
Additionally, the use of AI in content generation raises concerns about unintentional plagiarism, as systems may reproduce text without proper citation.
While AI tools may increase publication output, there may not be a corresponding increase in expertise or experience among researchers [3,12].
Several studies have investigated the comparison of AI and human capabilities across various domains.
Long et al.
noted a remarkable level of accuracy in AI, ranging from 90% to 100% when evaluating its performance against specialized doctors' diagnostic and treatment decisions for congenital cataracts [14].
Additionally, Rajpurkar et al.
discovered consistency in results between AI and radiologists, particularly in diagnosing chest radiographs [15].
However, there is limited available data on the comparison of AI and human performance in research principles.
In this study, the comparison between AI tools and human performance regarding predetermined correct answers on research principles revealed a significant agreement (80-85%) between ChatGPT and researchers.
One of the limitations of our study is that we evaluated only three AI systems in comparison to the vast and increasing number of AI tools becoming available in these times.
In addition, a larger number of questions will lead to a more comprehensive understanding of the strengths and weaknesses of these AI systems in the field of research and their utilities in that regard.
Conclusion
ChatGPT, Gemini, and Perplexity perform adequately overall in research-related questions, but depending on the AI in use, improvement is needed in certain research categories.
The involvement of an expert in the research publication process remains a fundamental cornerstone to ensure the quality of the work.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: RQS and SHM were major contributors to the conception of the study and the literature search for related studies.
AMS, JOA, DSH, and AMS were involved in the literature review, the study's design, and the critical revision of the manuscript, and they participated in data collection.
HAH, and YMM were involved in the literature review, study design, and manuscript writing.
BAA, DSH, and RQS Literature review, final approval of the manuscript, and processing of the tables.
RQS and SHM confirm the authenticity of all the raw data.
All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Note applicable.
Acknowledgement: Not applicable.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Diagnostic Performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in “Diagnosis Please” Cases
Diagnostic Performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in “Diagnosis Please” Cases
AbstractBackgroundsLarge language models (LLMs) are rapidly advancing and demonstrating high performance in understanding textual information, suggesting potential applications in ...
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
AbstractThis study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic...
Comparison Of Knowledge and Attitude Towards Chat GPT In First and Final-Year Dental Students
Comparison Of Knowledge and Attitude Towards Chat GPT In First and Final-Year Dental Students
Objective: This study aimed to examine the knowledge and attitude towards Chat-GPT among first and final-year dental students.
Methods: This was a cross-sectional comparative study...
Pengaruh Chat GPT Terhadap Kemampuan Menganalisis Data pada Mahasiswa
Pengaruh Chat GPT Terhadap Kemampuan Menganalisis Data pada Mahasiswa
This study aims to analyze the impact of Chat GPT usage on data analysis skills among students. The research employs a quantitative approach with data collection through questionna...
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
AbstractThis pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evaluation of Japanese-...
Analisis Pemanfaatan Artificial Intelligence (AI) Menggunakan Chat GPT Terhadap Kualitas Akademik Mahasiswa
Analisis Pemanfaatan Artificial Intelligence (AI) Menggunakan Chat GPT Terhadap Kualitas Akademik Mahasiswa
Penelitian ini bertujuan untuk menganalisis dampak pemanfaatan teknologi Artificial Intelligence (AI) khususnya Chat GPT terhadap kualitas akademik mahasiswa. Chat GPT, sebagai sal...

