Javascript must be enabled to continue!

Performance Analysis of Transformer Based Models for Automatic Short Answer Grading

Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks. This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency. BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers. We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance. The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability. This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings. Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015). With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020). Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018). Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al., 2021). While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al., 2019). Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013). The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al., 2020). Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al., 2019). This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al., 2016). A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al., 2017). Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text. This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al., 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al., 2020). In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing. These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG. We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score. Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments. Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks. Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems. Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models. This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research. Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations. The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.

Science Research Society

Rupal Chaudhari

Journal of Information Systems Engineering and Management

2025

Title: Performance Analysis of Transformer Based Models for Automatic Short Answer Grading

Description:

Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed.

Recent advances in Natural Language Processing (NLP) have introduced powerful Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformer 3 (GPT-3), which have demonstrated state-of-the-art performance across various text-based tasks.

This paper presents a comparative study of these three models in the context of ASAG, evaluating their effectiveness, accuracy, and efficiency.

BERT’s bidirectional encoding, T5’s text-to-text framework, and GPT-3’s autoregressive generation are explored in depth to assess their ability to understand, grade, and generate feedback on short answers.

We utilize standard ASAG datasets and multiple evaluation metrics, including accuracy, precision, recall, and F1-score, to measure their performance.

The comparative analysis reveals that while all three models exhibit strong capabilities, they vary in handling complex language and ambiguous student responses, with trade-offs in computational cost and scalability.

This study highlights the strengths and weaknesses of each model in ASAG and offers insights into their practical applications in educational settings.

Introduction: The automation of grading has become a focal point in modern education systems, driven by the increasing demand for scalable and efficient assessment solutions (Sahu & Bhowmick, 2015).

With the proliferation of online learning platforms, digital classrooms, and remote education, the ability to automatically grade short-answer questions has gained significant importance (Gomaa & Fahmy, 2020).

Automatic Short Answer Grading (ASAG) seeks to evaluate student responses by comparing them to model answers, often assessing the content’s correctness, relevance, and linguistic features—critical components for evaluating students’ understanding and knowledge retention (Busatta & Brancher, 2018).

Traditional ASAG approaches typically employed rule-based systems, statistical models, and early machine learning algorithms that relied heavily on predefined keywords, templates, or handcrafted features (Tulu et al.

, 2021).

While effective for straightforward, fact-based questions, these systems struggled to capture the complexity and variability of natural language, resulting in reduced grading accuracy—especially for creative or ambiguous responses (Sychev et al.

, 2019).

Consequently, such methods often required significant manual intervention, limiting their scalability and applicability in dynamic educational settings (Muftah & Aziz, 2013).

The advent of deep learning, particularly in the field of Natural Language Processing (NLP), has marked a transformative shift in ASAG (Gaddipati et al.

, 2020).

Neural network-based models have demonstrated a remarkable capacity to learn and generalize from large datasets, enabling a more nuanced understanding of language (Wang et al.

, 2019).

This has led to the development of more robust ASAG systems capable of handling a broader spectrum of student responses, ranging from factual answers to complex explanations (Roy et al.

, 2016).

A pivotal advancement in NLP is the introduction of the Transformer architecture, which has revolutionized how language models are designed and trained (Vaswani et al.

, 2017).

Transformers excel in processing sequential data through self-attention mechanisms that capture long-range dependencies and contextual relationships within text.

This architectural innovation has significantly enhanced performance across a variety of NLP tasks, such as machine translation, sentiment analysis, and question answering (Peters et al.

, 2018), making Transformer-based models particularly suitable for enhancing ASAG systems (Raffel et al.

, 2020).

In this paper, we focus on three prominent Transformer-based models—BERT, T5, and GPT-3—each representing a distinct approach to language understanding and processing.

These models have set new benchmarks across numerous NLP tasks, and their potential application in ASAG is substantial Objectives: The goal of this study is to conduct a comparative analysis of these three Transformer models—BERT, T5, and GPT-3—in the context of ASAG.

We evaluate their performance on standard ASAG datasets using multiple evaluation metrics, such as accuracy, precision, recall, and F1-score.

Additionally, we analyze the computational efficiency and scalability of these models to determine their practicality for deployment in large-scale educational environments.

Methods: By providing a comprehensive comparison, this study seeks to shed light on the strengths and weaknesses of each model and their suitability for different types of ASAG tasks.

Moreover, we aim to offer insights that can guide future research and development in this area, ultimately contributing to the creation of more effective and reliable automated grading systems.

Results: The results of our comparative analysis of BERT, T5, and GPT-3 in the context of Automatic Short Answer Grading (ASAG) reveal important insights into the strengths and limitations of these Transformer models.

This section discusses the implications of our findings, the practical considerations for deploying these models in educational settings, and identifies potential avenues for future research.

Conclusions: In conclusion, this study provides a comprehensive comparative analysis of BERT, T5, and GPT-3 for ASAG, highlighting their strengths, limitations, and practical considerations.

The insights gained from this research contribute to the ongoing development and refinement of automated grading systems, with the potential to enhance educational assessment and support in diverse learning environments.

Back

Automatic Short Answer Grading (ASAG) has gained increasing importance in educational technology, where accurate and scalable assessment solutions are needed. Recent advances in Na...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

High frequency modeling of power transformers under transients

This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

Simulation modeling study on short circuit ability of distribution transformer

Abstract Under short circuit condition, the oil immersed distribution transformer will endure combined electro-thermal stress, eventually lead to the mechanical dama...

Study on radiographic grading of ankle joint in adult patients with Kashin Beck disease in Shaanxi and Gansu Province, China

Abstract Purpose This paper aims to establish an X-ray imaging grading for assessing ankle joints in adult Kashin Beck disease (KBD), and investigate its correlation with ...

4D flow MRI-based grading of left ventricular diastolic dysfunction: a validation study against echocardiography

Abstract Objectives To assess the feasibility and accuracy of 4D flow MRI-based grading of left ventricular diastolic dysfunction, using echocard...

LIFE CYCLE OF TRANSFORMER 110/X KV AND ITS VALUE

In a deregulated environment, power companies are in the constant process of reducing the costs of operating power facilities, with the aim of optimally improving the quality of de...

ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)

One cause the interruption of transformer is loading that exceeds the capabilities of the transformer. The state of continuous overload will affect the age of the transformer and r...

Email:
Password:

Email:

Performance Analysis of Transformer Based Models for Automatic Short Answer Grading

Related Results