Javascript must be enabled to continue!

How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation

As life in the digital era becomes more complex, the capacity for criminal activity within the digital realm becomes even more widespread. More recently, the development of deepfake media generation powered by Artificial Intelligence pushes audio and video content into a realm of doubt, misinformation, or misrepresentation. The instances of deepfake videos are numerous, with some infamous cases ranging from manufactured graphic images of the musician Taylor Swift, through to the loss of $25 million dollars transferred after a faked video call. The problems of deepfake are becoming increasingly concerning for the general public when such material is submitted into evidence in a court case, especially a criminal trial. The current methods of authentication against such deepfake evidence threats are insufficient. When considering speech within audio forensics, there is sufficient ‘individuality’ in one’s own voice to enable comparison for identification. In the case of authenticating audio for deepfake speech, it is possible to use this same comparative approach to identify rogue or incomparable harmonic and formant patterns within the speech. The presence of deepfake media within the realms of illegal activity demands appropriate legal enforcement, resulting in a requirement for robust detection methods. The work presented in this paper proposes a robust technique for identifying such AI-synthesized speech using a quantifiable method that proves to be justified within court proceedings. Furthermore, it presents the correlation between the harmonic content of human speech patterns and the AI-generated clones they produce. This paper details which spectrographic audio characteristics were found that may prove helpful towards authenticating speech for forensic purposes in the future. The results demonstrate that using specific frequency ranges to compare against a known audio sample of a person’s speech, indicates the presence of deepfake media due to different harmonic structures. KEYWORDS: Artificial Intelligence, Digital Forensics, Speech Processing, Speech Analysis.

Sri Lanka Institute of Information Technology

Emily L. Williams Karl O. Jones Colin Robinson Sebastian Chandler Crnigoj Helen Burrell Suzzanne McColl

Journal of Advances in Engineering and Technology

2025

Title: How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation

Description:

As life in the digital era becomes more complex, the capacity for criminal activity within the digital realm becomes even more widespread.

More recently, the development of deepfake media generation powered by Artificial Intelligence pushes audio and video content into a realm of doubt, misinformation, or misrepresentation.

The instances of deepfake videos are numerous, with some infamous cases ranging from manufactured graphic images of the musician Taylor Swift, through to the loss of $25 million dollars transferred after a faked video call.

The problems of deepfake are becoming increasingly concerning for the general public when such material is submitted into evidence in a court case, especially a criminal trial.

The current methods of authentication against such deepfake evidence threats are insufficient.

When considering speech within audio forensics, there is sufficient ‘individuality’ in one’s own voice to enable comparison for identification.

In the case of authenticating audio for deepfake speech, it is possible to use this same comparative approach to identify rogue or incomparable harmonic and formant patterns within the speech.

The presence of deepfake media within the realms of illegal activity demands appropriate legal enforcement, resulting in a requirement for robust detection methods.

The work presented in this paper proposes a robust technique for identifying such AI-synthesized speech using a quantifiable method that proves to be justified within court proceedings.

Furthermore, it presents the correlation between the harmonic content of human speech patterns and the AI-generated clones they produce.

This paper details which spectrographic audio characteristics were found that may prove helpful towards authenticating speech for forensic purposes in the future.

The results demonstrate that using specific frequency ranges to compare against a known audio sample of a person’s speech, indicates the presence of deepfake media due to different harmonic structures.

KEYWORDS: Artificial Intelligence, Digital Forensics, Speech Processing, Speech Analysis.

Back

Abstract. Voice Over or what is known as VO is being discussed a lot, not only about the profession, but also from the industry side and the various voice over techniques used. Due...

Evaluating the Threshold of Authenticity in Deepfake Audio and Its Implications Within Criminal Justice

Deepfake technology has come a long way in recent years and the world has already seen cases where it has been used maliciously. After a deepfake of UK independent financial adviso...

Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia

Objective: To determine the frequency of common chromosomal aberrations in local population idiopathic determine the frequency of common chromosomal aberrations in local population...

Analysis of deepfake crime trends using BIGKinds

This study is significant for analyzing criminal trends using deepfake technology based on media reports. A total of 478 articles related to crimes using deepfake technology were e...

Deepfake attack prevention using steganography GANs

Background Deepfakes are fake images or videos generated by deep learning algorithms. Ongoing progress in deep learning techniques like auto-encoders and generative adversarial net...

Deepfake Detection with Choquet Fuzzy Integral

Deep forgery has been spreading quite quickly in recent years and continues to develop. The development of deep forgery has been used in films. This development and spread have beg...

Feature selection for multimodal: acoustic event detection

The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...

Deep Fake Audio Detection Using Deep Learning

This project is titled as “Deep Fake Audio Detection Using Deep Learning”. The rapid advancement of artificial intelligence and deep learning technologies, deepfake audio has emerg...

Email:
Password:

Email:

How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation

Related Results