Javascript must be enabled to continue!

True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)

Deepfakes have become one of the most pressing issues of our time, and what used to take a team of visual effects experts weeks to do can now be done in minutes using freely available software, with results increasingly indistinguishable from reality. We present True Vision AI, a deepfake video detection system based on a two-stream ensemble approach utilizing both spatial and temporal understanding. Our system combines a fine-tuned Xception network (pre-trained on ImageNet) for detecting subtle visual inconsistencies in individual frames, alongside a Video Vision Transformer (ViViT-B/16x2, pre-trained on Kinetics-400) for detecting motion-level anomalies across frames. Features from both networks are merged into a unified 2,816-dimensional vector fed into a compact classifier to determine whether a video is real or fake. Trained and tested on the Celeb-DF dataset (890 genuine videos and 808 deepfakes), our Xception model achieves 88.5% validation accuracy, ViViT achieves 87.0%, and the ensemble achieves 88.3%, The final system is deployed as a lightweight Flask API that provides a determination, a confidence score, and a frame-level breakdown of where deception is likely occurring.

RSP Science Hub

Gautham Rishab S Gowtham S Mrs. Sakthi P

International Research Journal on Advanced Engineering and Management (IRJAEM)

2026

Title: True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)

Description:

We present True Vision AI, a deepfake video detection system based on a two-stream ensemble approach utilizing both spatial and temporal understanding.

Our system combines a fine-tuned Xception network (pre-trained on ImageNet) for detecting subtle visual inconsistencies in individual frames, alongside a Video Vision Transformer (ViViT-B/16x2, pre-trained on Kinetics-400) for detecting motion-level anomalies across frames.

Features from both networks are merged into a unified 2,816-dimensional vector fed into a compact classifier to determine whether a video is real or fake.

Trained and tested on the Celeb-DF dataset (890 genuine videos and 808 deepfakes), our Xception model achieves 88.

5% validation accuracy, ViViT achieves 87.

0%, and the ensemble achieves 88.

3%, The final system is deployed as a lightweight Flask API that provides a determination, a confidence score, and a frame-level breakdown of where deception is likely occurring.

Back

<p>This article scrutinizes the history of Islamic development in Nusantara between 15th to 18th centuries, which has been colored from theological mysticism thought. Uniquel...

Korelasi Kadar Karboksihemoglobin terhadap Tekanan Darah Penduduk di Sekitar Terminal Bus Tirtonadi Surakarta

<table width="645" border="1" cellspacing="0" cellpadding="0"><tbody><tr><td valign="top" width="408"><p> </p><p>Carbon monoxide is a gas ...

A Touch of Space Weather - Outreach project for visually impaired students

<p><em><span data-preserver-spaces="true">'A Touch of Space Weather' is a project that brings space weather science into...

Evaluating the Threshold of Authenticity in Deepfake Audio and Its Implications Within Criminal Justice

Deepfake technology has come a long way in recent years and the world has already seen cases where it has been used maliciously. After a deepfake of UK independent financial adviso...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Deepfake Detection using Deep Learning with InceptionV3

Deepfake technology has rapidly evolved, making it increasingly difficult to distinguish between real and manipulated videos. This poses serious risks, including misinformation, id...

Comparison of CNN, ResNet50, and Xception for Deepfake Image Detection

This study compares the performance of three deep learning architectures—Convolutional Neural Network , ResNet50, and Xception—for frame-based deepfake image detection and identifi...

Deepfake attack prevention using steganography GANs

Background Deepfakes are fake images or videos generated by deep learning algorithms. Ongoing progress in deep learning techniques like auto-encoders and generative adversarial net...

Email:
Password:

Email:

True Vision AI: Deepfake Video Detection Using a Hybrid Ensemble of Xception and Video Vision Transformer (ViViT)

Related Results