Javascript must be enabled to continue!
Robust Explainable AI via Adversarial Latent Diffusion Models: Mitigating Gradient Obfuscation with Interpretable Feature Attribution
View through CrossRef
This study introduces the Adversarial Latent Diffusion Explanations (ALDE) framework, a novel approach aimed at improving the robustness and interpretability of explainable AI (XAI) methods under adversarial conditions. An experimental research design was used to integrate diffusion models with adversarial training, focusing on deep image classification tasks. The framework was tested using two popular datasets—ImageNet and CIFAR-10—and two pre-trained deep learning models, ResNet-50 and WideResNet-28-10.
The ALDE framework combines a Denoising Diffusion Probabilistic Model (DDPM) for input purification with Projected Gradient Descent (PGD) for adversarial training. For explanation generation, Integrated Gradients was employed to produce interpretable feature attributions. The models were evaluated based on adversarial robustness, explanation stability (measured by Structural Similarity Index Measure, SSIM), and interpretability (using Intersection over Union, IoU, with saliency maps).
Results show that ALDE significantly outperforms existing XAI methods like SHAP and LIME. On ImageNet, ResNet-50’s adversarial accuracy increased from 41.2% (SHAP) to 55.3% with ALDE. Similarly, SSIM improved from 0.56 to 0.82, and IoU from 0.47 to 0.63. WideResNet models saw similar gains. These improvements confirm ALDE’s effectiveness in enhancing model defense while producing more stable and semantically accurate explanations.
In summary, ALDE demonstrates a strong ability to defend against gradient-based adversarial attacks and deliver reliable, interpretable attributions. This research contributes toward building trustworthy AI systems by addressing the key challenge of explanation degradation under adversarial influence.
Science Research Society
Title: Robust Explainable AI via Adversarial Latent Diffusion Models: Mitigating Gradient Obfuscation with Interpretable Feature Attribution
Description:
This study introduces the Adversarial Latent Diffusion Explanations (ALDE) framework, a novel approach aimed at improving the robustness and interpretability of explainable AI (XAI) methods under adversarial conditions.
An experimental research design was used to integrate diffusion models with adversarial training, focusing on deep image classification tasks.
The framework was tested using two popular datasets—ImageNet and CIFAR-10—and two pre-trained deep learning models, ResNet-50 and WideResNet-28-10.
The ALDE framework combines a Denoising Diffusion Probabilistic Model (DDPM) for input purification with Projected Gradient Descent (PGD) for adversarial training.
For explanation generation, Integrated Gradients was employed to produce interpretable feature attributions.
The models were evaluated based on adversarial robustness, explanation stability (measured by Structural Similarity Index Measure, SSIM), and interpretability (using Intersection over Union, IoU, with saliency maps).
Results show that ALDE significantly outperforms existing XAI methods like SHAP and LIME.
On ImageNet, ResNet-50’s adversarial accuracy increased from 41.
2% (SHAP) to 55.
3% with ALDE.
Similarly, SSIM improved from 0.
56 to 0.
82, and IoU from 0.
47 to 0.
63.
WideResNet models saw similar gains.
These improvements confirm ALDE’s effectiveness in enhancing model defense while producing more stable and semantically accurate explanations.
In summary, ALDE demonstrates a strong ability to defend against gradient-based adversarial attacks and deliver reliable, interpretable attributions.
This research contributes toward building trustworthy AI systems by addressing the key challenge of explanation degradation under adversarial influence.
Related Results
Enhancing analog circuit security through obfuscation
Enhancing analog circuit security through obfuscation
The focus of this dissertation is the safeguarding of analog circuits against IP piracy attacks, which includes the development of a novel method to secure analog IP, the assessmen...
iOLLVM: Enhanced Version of OLLVM
iOLLVM: Enhanced Version of OLLVM
Code obfuscation increases the difficulty of understanding programs, improves software security, and, in particular, OLLVM offers the possibility of cross-platform code obfuscation...
ProDef-MDS: A Proactive Defense Mechanism Protecting Malware Detection Systems from Adversarial Attacks
ProDef-MDS: A Proactive Defense Mechanism Protecting Malware Detection Systems from Adversarial Attacks
Malware threatens cybersecurity by enabling data theft, unauthorized access, and extortion. Traditional malware detection systems (MDS) struggle with the increasing volume and comp...
Epidemiological, diagnostic and medical-social aspects of latent syphilis
Epidemiological, diagnostic and medical-social aspects of latent syphilis
Objective — to study epidemiological, clinical and medical-social aspects of latent syphilis in Ukraine over the past 40 years.
Materials and methods. Data of patients with latent ...
Enhancing Adversarial Robustness through Stable Adversarial Training
Enhancing Adversarial Robustness through Stable Adversarial Training
Deep neural network models are vulnerable to attacks from adversarial methods, such as gradient attacks. Evening small perturbations can cause significant differences in their pred...
Improving Diversity and Quality of Adversarial Examples in Adversarial Transformation Network
Improving Diversity and Quality of Adversarial Examples in Adversarial Transformation Network
Abstract
This paper proposes a method to mitigate two major issues of Adversarial Transformation Networks (ATN) including the low diversity and the low quality of adversari...
Comment on: Macroscopic water vapor diffusion is not enhanced in snow
Comment on: Macroscopic water vapor diffusion is not enhanced in snow
Abstract. The central thesis of the authors’ paper is that macroscopic water vapor diffusion is not enhanced in snow compared to diffusion through humid air alone. Further, mass di...
Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion
Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion
Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversa...

