Javascript must be enabled to continue!
AI VERSUS HUMAN GRADERS: ASSESSING THE ROLE OF LARGE LANGUAGE MODELS IN HIGHER EDUCATION
View through CrossRef
While artificial intelligence (AI) grading is seeing an increase in use and adoption, traditional educational practices are also forced to adapt and function together with AI, especially in assessment grading. In retrospect, human grading, on the other hand, has long been the cornerstone of educational assessment. Conventionally, educators have assessed student work based on established criteria, providing feedback intended to support learning and development. While human grading offers nuanced understanding and personalized feedback, it is also subject to limitations such as grading inconsistencies, biases, and significant time demands. This paper explores the role of large language models (LLMs), such as ChatGPT-3.5 and ChatGPT-4, in grading processes in higher education and compares their effectiveness with that of traditional human grading methods. The study uses both qualitative and quantitative methodologies, and the research extends across multiple academic programs and modules, providing a comprehensive assessment of how AI can complement or replace human graders. In study 1, we focused on (n=195) scripts in (n=3) modules and compared GPT 3.5, GPT 4, and human graders. Manually marked scripts exhibited an average of 24% mark difference. Subsequently, (n=20) scripts were assessed using GPT-4, which yielded a more precise evaluation. Total average of 4% difference in results. There were individual instances where marks were higher, but this could not naturally be a marker judgment. In Study 2, the results from the first study highlighted the need for a comprehensive memorandum; thus, we identified (n=4341), among which (n=3508) scripts were used. The study found that AI remains efficient when the memorandum is well-structured. It was also found that while AI excels in scalability, human graders excel in interpreting complex answers, evaluating creativity, and picking up plagiarism. In Study 3, we evaluated formative assessments in GPT 4 (statistics n=602, Business Statistics n=859 and Logistics Management n=522). The third study demonstrated that AI marking tools can effectively manage the demands of formative assessments, particularly in modules where the questions are objective and structured, such as Statistics and Logistics Management. The initial error in Statistics 102 highlighted the importance of a well-designed memorandum. The study concludes that AI tools can effectively reduce the burden on educators but should be integrated into a hybrid model in which human markers and AI systems work in tandem to achieve fairness, accuracy, and quality in assessments. This paper contributes to ongoing debates about the future of AI in education by emphasizing the importance of a well-structured memorandum and human discretion in achieving balanced and effective grading solutions.
Innovare Academic Sciences Pvt Ltd
Title: AI VERSUS HUMAN GRADERS: ASSESSING THE ROLE OF LARGE LANGUAGE MODELS IN HIGHER EDUCATION
Description:
While artificial intelligence (AI) grading is seeing an increase in use and adoption, traditional educational practices are also forced to adapt and function together with AI, especially in assessment grading.
In retrospect, human grading, on the other hand, has long been the cornerstone of educational assessment.
Conventionally, educators have assessed student work based on established criteria, providing feedback intended to support learning and development.
While human grading offers nuanced understanding and personalized feedback, it is also subject to limitations such as grading inconsistencies, biases, and significant time demands.
This paper explores the role of large language models (LLMs), such as ChatGPT-3.
5 and ChatGPT-4, in grading processes in higher education and compares their effectiveness with that of traditional human grading methods.
The study uses both qualitative and quantitative methodologies, and the research extends across multiple academic programs and modules, providing a comprehensive assessment of how AI can complement or replace human graders.
In study 1, we focused on (n=195) scripts in (n=3) modules and compared GPT 3.
5, GPT 4, and human graders.
Manually marked scripts exhibited an average of 24% mark difference.
Subsequently, (n=20) scripts were assessed using GPT-4, which yielded a more precise evaluation.
Total average of 4% difference in results.
There were individual instances where marks were higher, but this could not naturally be a marker judgment.
In Study 2, the results from the first study highlighted the need for a comprehensive memorandum; thus, we identified (n=4341), among which (n=3508) scripts were used.
The study found that AI remains efficient when the memorandum is well-structured.
It was also found that while AI excels in scalability, human graders excel in interpreting complex answers, evaluating creativity, and picking up plagiarism.
In Study 3, we evaluated formative assessments in GPT 4 (statistics n=602, Business Statistics n=859 and Logistics Management n=522).
The third study demonstrated that AI marking tools can effectively manage the demands of formative assessments, particularly in modules where the questions are objective and structured, such as Statistics and Logistics Management.
The initial error in Statistics 102 highlighted the importance of a well-designed memorandum.
The study concludes that AI tools can effectively reduce the burden on educators but should be integrated into a hybrid model in which human markers and AI systems work in tandem to achieve fairness, accuracy, and quality in assessments.
This paper contributes to ongoing debates about the future of AI in education by emphasizing the importance of a well-structured memorandum and human discretion in achieving balanced and effective grading solutions.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
Cometary Physics Laboratory: spectrophotometric experiments
Cometary Physics Laboratory: spectrophotometric experiments
<p><strong><span dir="ltr" role="presentation">1. Introduction</span></strong&...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
Navigating Language Ideologies Through Translanguaging in EAL Classrooms of Pakistan: A Sociolinguistics Perspective
Navigating Language Ideologies Through Translanguaging in EAL Classrooms of Pakistan: A Sociolinguistics Perspective
Language is a tool for instructing and expressing a variety of perspectives. This study aimed to explore the ideologies navigated through translanguaging in Pakistani institutions ...
Exploring Language Features of Male and Female Speakers in Pakistani TEDx Talks: A Corpus-based Comparative Analysis
Exploring Language Features of Male and Female Speakers in Pakistani TEDx Talks: A Corpus-based Comparative Analysis
The study explores the linguistic patterns in Pakistani TEDx Talks. It is based on gender-based language use. It consists of ten talks selected from YouTube and applies both quanti...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
The influence of entrepreneurial role model on entrepreneurial intention: a cross-level investigation
The influence of entrepreneurial role model on entrepreneurial intention: a cross-level investigation
Purpose
Most of current studies have explored the impact of entrepreneurial culture on entrepreneurial intentions in specific region rather than cross-cultural regions; in addition...

