Javascript must be enabled to continue!

Metaheuristic adapted convolutional neural network for Telugu speaker diarization

In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.

SAGE Publications

Sethuram V Ande Prasad R. Rajeswara Rao

Intelligent Decision Technologies

2021

Title: Metaheuristic adapted convolutional neural network for Telugu speaker diarization

Description:

In speech technology, a pivotal role is being played by the Speaker diarization mechanism.

In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers.

The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity.

In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process.

Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted.

Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed.

Then, the acquired speech signals are segmented for each individual speaker.

Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used.

To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO).

Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work.

Accordingly, the accuracy of the proposed method is 0.

8073, which is 5.

255, 2.

45%, and 0.

075, superior to the existing works.

Back

Related Results

Telugu Dependency Treebank

We discuss Telugu Language and Treebanks briefly in this work. Initially, we'll go over the Telugu language briefly. The paninian grammatical model utilized for Telugu dependency r...

Quarantine Powers, Biodefense, and Andrew Speaker

In January 2007, Andrew Speaker (Speaker) underwent a chest X-ray and CT scan, which revealed an abnormality in his lungs. However, tests results indicated that he did not ha...

Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition

Abstract Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement. The performance of speaker reco...

Target sample mining with modified activation residual network for speaker verification

In the domain of speaker verification, Softmax can be used as a backend for multi-classification, but traditional Softmax methods have some limitations that limit performance. Duri...

Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer

Abstract— Alzheimer's disease is a neurodegenerative disease that develops gradually, and is associated with cardiovascular and cerebrovascular problems. Alzheimer's is a serious d...

Role of Digital Marketing Data Analytics in Film Industry: Telugu Cinema into Pan India Magnum Opus

The Indian film industry is at average growth rate of 11.5% year-on-year basis. After the Hindi film industry, regional cinema like the Telugu industry (popularly known as Tollywoo...

Fuzzy Chaotic Neural Networks

An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion

Abstract The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...

Email:
Password:

Email: