Javascript must be enabled to continue!

Robust Explainable AI via Adversarial Latent Diffusion Models: Mitigating Gradient Obfuscation with Interpretable Feature Attribution

This study introduces the Adversarial Latent Diffusion Explanations (ALDE) framework, a novel approach aimed at improving the robustness and interpretability of explainable AI (XAI) methods under adversarial conditions. An experimental research design was used to integrate diffusion models with adversarial training, focusing on deep image classification tasks. The framework was tested using two popular datasets—ImageNet and CIFAR-10—and two pre-trained deep learning models, ResNet-50 and WideResNet-28-10. The ALDE framework combines a Denoising Diffusion Probabilistic Model (DDPM) for input purification with Projected Gradient Descent (PGD) for adversarial training. For explanation generation, Integrated Gradients was employed to produce interpretable feature attributions. The models were evaluated based on adversarial robustness, explanation stability (measured by Structural Similarity Index Measure, SSIM), and interpretability (using Intersection over Union, IoU, with saliency maps). Results show that ALDE significantly outperforms existing XAI methods like SHAP and LIME. On ImageNet, ResNet-50’s adversarial accuracy increased from 41.2% (SHAP) to 55.3% with ALDE. Similarly, SSIM improved from 0.56 to 0.82, and IoU from 0.47 to 0.63. WideResNet models saw similar gains. These improvements confirm ALDE’s effectiveness in enhancing model defense while producing more stable and semantically accurate explanations. In summary, ALDE demonstrates a strong ability to defend against gradient-based adversarial attacks and deliver reliable, interpretable attributions. This research contributes toward building trustworthy AI systems by addressing the key challenge of explanation degradation under adversarial influence.

Science Research Society

Tejaskumar Dattatray Pujari

Journal of Information Systems Engineering and Management

2025

Title: Robust Explainable AI via Adversarial Latent Diffusion Models: Mitigating Gradient Obfuscation with Interpretable Feature Attribution

Description:

An experimental research design was used to integrate diffusion models with adversarial training, focusing on deep image classification tasks.

The framework was tested using two popular datasets—ImageNet and CIFAR-10—and two pre-trained deep learning models, ResNet-50 and WideResNet-28-10.

The ALDE framework combines a Denoising Diffusion Probabilistic Model (DDPM) for input purification with Projected Gradient Descent (PGD) for adversarial training.

For explanation generation, Integrated Gradients was employed to produce interpretable feature attributions.

The models were evaluated based on adversarial robustness, explanation stability (measured by Structural Similarity Index Measure, SSIM), and interpretability (using Intersection over Union, IoU, with saliency maps).

Results show that ALDE significantly outperforms existing XAI methods like SHAP and LIME.

On ImageNet, ResNet-50’s adversarial accuracy increased from 41.

2% (SHAP) to 55.

3% with ALDE.

Similarly, SSIM improved from 0.

56 to 0.

82, and IoU from 0.

47 to 0.

63.

WideResNet models saw similar gains.

These improvements confirm ALDE’s effectiveness in enhancing model defense while producing more stable and semantically accurate explanations.

In summary, ALDE demonstrates a strong ability to defend against gradient-based adversarial attacks and deliver reliable, interpretable attributions.

This research contributes toward building trustworthy AI systems by addressing the key challenge of explanation degradation under adversarial influence.

Back

The focus of this dissertation is the safeguarding of analog circuits against IP piracy attacks, which includes the development of a novel method to secure analog IP, the assessmen...

iOLLVM: Enhanced Version of OLLVM

Code obfuscation increases the difficulty of understanding programs, improves software security, and, in particular, OLLVM offers the possibility of cross-platform code obfuscation...

ProDef-MDS: A Proactive Defense Mechanism Protecting Malware Detection Systems from Adversarial Attacks

Malware threatens cybersecurity by enabling data theft, unauthorized access, and extortion. Traditional malware detection systems (MDS) struggle with the increasing volume and comp...

Epidemiological, diagnostic and medical-social aspects of latent syphilis

Objective — to study epidemiological, clinical and medical-social aspects of latent syphilis in Ukraine over the past 40 years. Materials and methods. Data of patients with latent ...

Enhancing Adversarial Robustness through Stable Adversarial Training

Deep neural network models are vulnerable to attacks from adversarial methods, such as gradient attacks. Evening small perturbations can cause significant differences in their pred...

Improving Diversity and Quality of Adversarial Examples in Adversarial Transformation Network

Abstract This paper proposes a method to mitigate two major issues of Adversarial Transformation Networks (ATN) including the low diversity and the low quality of adversari...

Comment on: Macroscopic water vapor diffusion is not enhanced in snow

Abstract. The central thesis of the authors’ paper is that macroscopic water vapor diffusion is not enhanced in snow compared to diffusion through humid air alone. Further, mass di...

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversa...

Email:
Password:

Email:

Robust Explainable AI via Adversarial Latent Diffusion Models: Mitigating Gradient Obfuscation with Interpretable Feature Attribution

Related Results