Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Metaheuristic adapted convolutional neural network for Telugu speaker diarization

View through CrossRef
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.
Title: Metaheuristic adapted convolutional neural network for Telugu speaker diarization
Description:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism.
In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers.
The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity.
In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process.
Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted.
Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed.
Then, the acquired speech signals are segmented for each individual speaker.
Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used.
To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO).
Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work.
Accordingly, the accuracy of the proposed method is 0.
8073, which is 5.
255, 2.
45%, and 0.
075, superior to the existing works.

Related Results

Telugu Dependency Treebank
Telugu Dependency Treebank
We discuss Telugu Language and Treebanks briefly in this work. Initially, we'll go over the Telugu language briefly. The paninian grammatical model utilized for Telugu dependency r...
Quarantine Powers, Biodefense, and Andrew Speaker
Quarantine Powers, Biodefense, and Andrew Speaker
In January 2007, “Andrew Speaker (“Speaker”) underwent a chest X-ray and CT scan, which revealed an abnormality in his lungs.” However, tests results indicated that he did not ha...
Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition
Abstract Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement. The performance of speaker reco...
Target sample mining with modified activation residual network for speaker verification
Target sample mining with modified activation residual network for speaker verification
In the domain of speaker verification, Softmax can be used as a backend for multi-classification, but traditional Softmax methods have some limitations that limit performance. Duri...
Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer
Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer
Abstract— Alzheimer's disease is a neurodegenerative disease that develops gradually, and is associated with cardiovascular and cerebrovascular problems. Alzheimer's is a serious d...
Role of Digital Marketing Data Analytics in Film Industry: Telugu Cinema into Pan India Magnum Opus
Role of Digital Marketing Data Analytics in Film Industry: Telugu Cinema into Pan India Magnum Opus
The Indian film industry is at average growth rate of 11.5% year-on-year basis. After the Hindi film industry, regional cinema like the Telugu industry (popularly known as Tollywoo...
Fuzzy Chaotic Neural Networks
Fuzzy Chaotic Neural Networks
An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion
Abstract The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...

Back to Top