Javascript must be enabled to continue!
Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
View through CrossRef
Abstract
Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement. The performance of speaker recognition get degraded with the noise, speakers physical and behavioral changes. Fusion of Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) features are used to improve the performance of machine learning based speaker recognition systems in the noisy condition. Deep learning models, especially Convolutional Neural Network (CNN) and its hybrid approaches outperform machine learning approaches in speaker recognition. Previous CNN based speaker recognition models has used Mel Spectrogram features as an input. Even though, Mel Spectrogram features show better performance compared to the handcrafted features, its performance get degraded with noise and behavioral changes of speaker. In this work, a CNN based speaker recognition model is developed using fusion of Mel Spectrogram and Cochleogram feature as input. The speaker recognition performance of the fusion of Mel Spectrogram and Cochleogram features is compared with the performance of Mel Spectrogram and Cochleogram features without fusing. The train-clean-100 part of the LibriSpeech dataset, which consists of 251 speakers (126 male and 125 female speakers) and 28,539 utterances is used for the experiment of proposed model. CNN model is trained and evaluated for 20 epochs using training and validation data respectively. Proposed speaker recognition model which uses fusion of Mel Spectrogram and Cochleogram as input for CNN has accuracy of 99.56%. Accuracy of CNN based speaker recognition with Mel Spectrogram is 98.15% and Cochleogram features is 97.43%. The results show that fusion of Mel Spectrogram and Cochleogram features improve the performance of speaker recognition.
Title: Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
Description:
Abstract
Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement.
The performance of speaker recognition get degraded with the noise, speakers physical and behavioral changes.
Fusion of Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) features are used to improve the performance of machine learning based speaker recognition systems in the noisy condition.
Deep learning models, especially Convolutional Neural Network (CNN) and its hybrid approaches outperform machine learning approaches in speaker recognition.
Previous CNN based speaker recognition models has used Mel Spectrogram features as an input.
Even though, Mel Spectrogram features show better performance compared to the handcrafted features, its performance get degraded with noise and behavioral changes of speaker.
In this work, a CNN based speaker recognition model is developed using fusion of Mel Spectrogram and Cochleogram feature as input.
The speaker recognition performance of the fusion of Mel Spectrogram and Cochleogram features is compared with the performance of Mel Spectrogram and Cochleogram features without fusing.
The train-clean-100 part of the LibriSpeech dataset, which consists of 251 speakers (126 male and 125 female speakers) and 28,539 utterances is used for the experiment of proposed model.
CNN model is trained and evaluated for 20 epochs using training and validation data respectively.
Proposed speaker recognition model which uses fusion of Mel Spectrogram and Cochleogram as input for CNN has accuracy of 99.
56%.
Accuracy of CNN based speaker recognition with Mel Spectrogram is 98.
15% and Cochleogram features is 97.
43%.
The results show that fusion of Mel Spectrogram and Cochleogram features improve the performance of speaker recognition.
Related Results
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Abstract
The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...
Melatonin receptor genes (mel‐1a, mel‐1b, mel‐1c) are differentially expressed in the avian germ line
Melatonin receptor genes (mel‐1a, mel‐1b, mel‐1c) are differentially expressed in the avian germ line
AbstractThe presence of melatonin receptor transcripts (mel‐1a, mel‐1b and mel‐1c) was investigated in primordial germ cells (PGCs), immature and mature oocytes, and sperm of Japan...
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Quarantine Powers, Biodefense, and Andrew Speaker
Quarantine Powers, Biodefense, and Andrew Speaker
In January 2007, Andrew Speaker (Speaker) underwent a chest X-ray and CT scan, which revealed an abnormality in his lungs. However, tests results indicated that he did not ha...
Expression of melatonin receptor transcripts (mel-1a, mel-1b and mel-1c) in Japanese quail oocytes and eggs
Expression of melatonin receptor transcripts (mel-1a, mel-1b and mel-1c) in Japanese quail oocytes and eggs
Cloning and sequencing of the cDNA for avian melatonin (MEL) receptors have made it possible to investigate the expression of these receptors in different animal tissues and organs...
Improved Outcomes with Bu/Cy+Melphalan and Bu/Cy+Thiotepa Regimens in Haploidentical Hematopoietic Stem Cell Transplantation for Non-Down Syndrome Acute Megakaryoblastic Leukemia
Improved Outcomes with Bu/Cy+Melphalan and Bu/Cy+Thiotepa Regimens in Haploidentical Hematopoietic Stem Cell Transplantation for Non-Down Syndrome Acute Megakaryoblastic Leukemia
Introduction
Acute Megakaryoblastic Leukemia (AMKL) accounts for approximately 10% of pediatric Acute Myeloid Leukemia (AML) cases and about 1% of adult AML cases...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract
The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Background and Objectives: As English is a stress-timed language, lexical stress plays an important role in the perception and processing of speech by native speakers. Incorrect st...


