Javascript must be enabled to continue!

SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA filter (SteadyEval-7B-deep) preprocesses student answers to remove answer-side prompt injection, after which the original Mistral-7B-Instruct rubric-guided grader assigns the final score. We build two exam-grading pipelines on top of Mistral-7B-Instruct: a baseline pipeline that scores student answers directly, and a guardrailed pipeline in which a LoRA-based filter (SteadyEval-7B-deep) first removes injection content from the answer and a downstream grader then assigns the final score. Using two rubric-guided short-answer datasets in machine learning and computer networking, we generate grouped families of clean answers and four classes of answer-side attacks, and we evaluate the impact of these attacks on score shifts, attack success rates, stability across prompt variants, and alignment with human graders. On the pooled dataset, answer-side attacks inflate grades in the unguarded baseline by an average of about +1.2 points on a 1–10 scale, and substantially increase score dispersion across prompt variants. The guardrailed pipeline largely removes this systematic grade inflation and reduces instability for many items, especially in the machine-learning exam, while keeping mean absolute error with respect to human reference scores in a similar range to the unguarded baseline on clean answers, with a conservative shift in networking that motivates per-course calibration. Chief-panel comparisons further show that the guardrailed pipeline tracks human grading more closely on machine-learning items, but tends to under-score networking answers. These findings are best interpreted as a proof-of-concept guardrail and require per-course validation and calibration before operational use.

MDPI AG

Catalin Anghel Marian Viorel Craciun Adina Cocu Andreea Alexandra Anghel Adrian Istrate

Computers

2026

Title: SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

Description:

Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection.

In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA filter (SteadyEval-7B-deep) preprocesses student answers to remove answer-side prompt injection, after which the original Mistral-7B-Instruct rubric-guided grader assigns the final score.

We build two exam-grading pipelines on top of Mistral-7B-Instruct: a baseline pipeline that scores student answers directly, and a guardrailed pipeline in which a LoRA-based filter (SteadyEval-7B-deep) first removes injection content from the answer and a downstream grader then assigns the final score.

Using two rubric-guided short-answer datasets in machine learning and computer networking, we generate grouped families of clean answers and four classes of answer-side attacks, and we evaluate the impact of these attacks on score shifts, attack success rates, stability across prompt variants, and alignment with human graders.

On the pooled dataset, answer-side attacks inflate grades in the unguarded baseline by an average of about +1.

2 points on a 1–10 scale, and substantially increase score dispersion across prompt variants.

The guardrailed pipeline largely removes this systematic grade inflation and reduces instability for many items, especially in the machine-learning exam, while keeping mean absolute error with respect to human reference scores in a similar range to the unguarded baseline on clean answers, with a conservative shift in networking that motivates per-course calibration.

Chief-panel comparisons further show that the guardrailed pipeline tracks human grading more closely on machine-learning items, but tends to under-score networking answers.

These findings are best interpreted as a proof-of-concept guardrail and require per-course validation and calibration before operational use.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

A Comprehensive Review of Distillation in the Pharmaceutical Industry

Distillation processes play a pivotal role in the pharmaceutical industry for the purification of active pharmaceutical ingredients (APIs), intermediates, and solvent recovery. Thi...

Crude Distillation Unit (CDU)

The chapter considers the technology of the crude distillation unit in general. The crude distillation unit is at the front-end of the oil refinery. The desalting process and disti...

Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis

Abstract Objective A paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted...

ProDef-MDS: A Proactive Defense Mechanism Protecting Malware Detection Systems from Adversarial Attacks

Malware threatens cybersecurity by enabling data theft, unauthorized access, and extortion. Traditional malware detection systems (MDS) struggle with the increasing volume and comp...

Improving Adversarial Robustness via Finding Flat Minimum of the Weight Loss Landscape

<p>Recent studies have shown that robust overfitting and robust generalization gap are a major trouble in adversarial training of deep neural networks. These interesting prob...

Unraveling the landscape of large language models: a systematic review and future perspectives

PurposeThe rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a compreh...

Automating Information Retrieval from Biodiversity Literature Using Large Language Models: A Case Study

Recently, Large Language Models (LLMs) have transformed information retrieval, becoming widely adopted across various domains due to their ability to process extensive textual data...

Email:
Password:

Email:

SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

Related Results