Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)

View through CrossRef
Deepfakes have become one of the most pressing issues of our time, and what used to take a team of visual effects experts weeks to do can now be done in minutes using freely available software, with results increasingly indistinguishable from reality. We present True Vision AI, a deepfake video detection system based on a two-stream ensemble approach utilizing both spatial and temporal understanding. Our system combines a fine-tuned Xception network (pre-trained on ImageNet) for detecting subtle visual inconsistencies in individual frames, alongside a Video Vision Transformer (ViViT-B/16x2, pre-trained on Kinetics-400) for detecting motion-level anomalies across frames. Features from both networks are merged into a unified 2,816-dimensional vector fed into a compact classifier to determine whether a video is real or fake. Trained and tested on the Celeb-DF dataset (890 genuine videos and 808 deepfakes), our Xception model achieves 88.5% validation accuracy, ViViT achieves 87.0%, and the ensemble achieves 88.3%, The final system is deployed as a lightweight Flask API that provides a determination, a confidence score, and a frame-level breakdown of where deception is likely occurring.
Title: True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)
Description:
Deepfakes have become one of the most pressing issues of our time, and what used to take a team of visual effects experts weeks to do can now be done in minutes using freely available software, with results increasingly indistinguishable from reality.
We present True Vision AI, a deepfake video detection system based on a two-stream ensemble approach utilizing both spatial and temporal understanding.
Our system combines a fine-tuned Xception network (pre-trained on ImageNet) for detecting subtle visual inconsistencies in individual frames, alongside a Video Vision Transformer (ViViT-B/16x2, pre-trained on Kinetics-400) for detecting motion-level anomalies across frames.
Features from both networks are merged into a unified 2,816-dimensional vector fed into a compact classifier to determine whether a video is real or fake.
Trained and tested on the Celeb-DF dataset (890 genuine videos and 808 deepfakes), our Xception model achieves 88.
5% validation accuracy, ViViT achieves 87.
0%, and the ensemble achieves 88.
3%, The final system is deployed as a lightweight Flask API that provides a determination, a confidence score, and a frame-level breakdown of where deception is likely occurring.

Related Results

KONTESTASI TASAWUF SUNNÎ DAN TASAWUF FALSAFÎ DI NUSANTARA
KONTESTASI TASAWUF SUNNÎ DAN TASAWUF FALSAFÎ DI NUSANTARA
<p>This article scrutinizes the history of Islamic development in Nusantara between 15th to 18th centuries, which has been colored from theological mysticism thought. Uniquel...
A Touch of Space Weather - Outreach project for visually impaired students
A Touch of Space Weather - Outreach project for visually impaired students
&lt;p&gt;&lt;em&gt;&lt;span data-preserver-spaces=&quot;true&quot;&gt;'A Touch of Space Weather' is a project that brings space weather science into...
Evaluating the Threshold of Authenticity in Deepfake Audio and Its Implications Within Criminal Justice
Evaluating the Threshold of Authenticity in Deepfake Audio and Its Implications Within Criminal Justice
Deepfake technology has come a long way in recent years and the world has already seen cases where it has been used maliciously. After a deepfake of UK independent financial adviso...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Deepfake Detection using Deep Learning with InceptionV3
Deepfake Detection using Deep Learning with InceptionV3
Deepfake technology has rapidly evolved, making it increasingly difficult to distinguish between real and manipulated videos. This poses serious risks, including misinformation, id...
Comparison of CNN, ResNet50, and Xception for Deepfake Image Detection
Comparison of CNN, ResNet50, and Xception for Deepfake Image Detection
This study compares the performance of three deep learning architectures—Convolutional Neural Network , ResNet50, and Xception—for frame-based deepfake image detection and identifi...
Deepfake attack prevention using steganography GANs
Deepfake attack prevention using steganography GANs
Background Deepfakes are fake images or videos generated by deep learning algorithms. Ongoing progress in deep learning techniques like auto-encoders and generative adversarial net...

Back to Top