Javascript must be enabled to continue!

Enhancing the Robustness of Zero-Shot LLMs Against Adversarial Prompts

Zero-shot large language models (LLMs) have proven highly effective in performing a wide range of tasks without the need for task-specific training, making them versatile tools in natural language processing. However, their susceptibility to adversarial prompts—inputs crafted to exploit inherent weaknesses—raises critical concerns about their reliability and safety in real-world applications. This paper focuses on evaluating the robustness of zero-shot LLMs when exposed to adversarial scenarios. A detailed evaluation framework was developed to systematically identify common vulnerabilities in the models' responses. The study explores mitigation techniques such as adversarial training to improve model resilience, refined prompt engineering to guide the models toward desired outcomes, and logical consistency checks to ensure coherent and ethical responses. Experimental findings reveal substantial gaps in robustness, particularly in handling ambiguous, misleading, or harmful prompts. These results underscore the importance of targeted interventions to address these vulnerabilities. The research provides actionable insights into improving zero-shot LLMs by enhancing their robustness and ensuring ethical adherence. These contributions align with the broader goal of creating safe, reliable, and responsible AI systems that can withstand adversarial manipulation while maintaining their high performance across diverse applications.

International Journal for Research in Applied Science and Engineering Technology (IJRASET)

Rambarki Sai Akshit

International Journal for Research in Applied Science and Engineering Technology

2025

Title: Enhancing the Robustness of Zero-Shot LLMs Against Adversarial Prompts

Description:

However, their susceptibility to adversarial prompts—inputs crafted to exploit inherent weaknesses—raises critical concerns about their reliability and safety in real-world applications.

This paper focuses on evaluating the robustness of zero-shot LLMs when exposed to adversarial scenarios.

A detailed evaluation framework was developed to systematically identify common vulnerabilities in the models' responses.

The study explores mitigation techniques such as adversarial training to improve model resilience, refined prompt engineering to guide the models toward desired outcomes, and logical consistency checks to ensure coherent and ethical responses.

Experimental findings reveal substantial gaps in robustness, particularly in handling ambiguous, misleading, or harmful prompts.

These results underscore the importance of targeted interventions to address these vulnerabilities.

The research provides actionable insights into improving zero-shot LLMs by enhancing their robustness and ensuring ethical adherence.

These contributions align with the broader goal of creating safe, reliable, and responsible AI systems that can withstand adversarial manipulation while maintaining their high performance across diverse applications.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study

Background Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Theory of ...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)

BACKGROUND Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...

LLMs and AI: Understanding Its Reach and Impact

Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...

Applied with Caution: Extreme-Scenario Testing Reveals Significant Risks in Using LLMs for Humanities and Social Sciences Paper Evaluation

The deployment of large language models (LLMs) in academic paper evaluation is increasingly widespread, yet their trustworthiness remains debated; to expose fundamental flaws often...

Evaluation of Prompting Strategies for Cyberbullying Detection Using Various Large Language Models

Sentiment analysis detects toxic language for safer online spaces and helps businesses refine strategies through customer feedback analysis [1, 2]. Advancements in Large Language M...

Enhancing Adversarial Robustness through Stable Adversarial Training

Deep neural network models are vulnerable to attacks from adversarial methods, such as gradient attacks. Evening small perturbations can cause significant differences in their pred...

When LLMs meet cybersecurity: a systematic literature review

Abstract The rapid development of large language models (LLMs) has opened new avenues across various fields, including cybersecurity, which faces an evolving threat lands...

Email:
Password:

Email:

Enhancing the Robustness of Zero-Shot LLMs Against Adversarial Prompts

Related Results