Javascript must be enabled to continue!
Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
View through CrossRef
Emotion is fundamental to interpersonal interactions since it assists mutual understanding. Developing human-computer interactions and a related digital product depends heavily on emotion recognition. Due to the need for human-computer interaction applications, deep learning models for the voice recognition of emotions are an essential area of research. Most speech emotion recognition algorithms are only deployed in European and a few Asian languages. However, for a low-resource tribal language like KUI, the dataset is not available. So, we created the dataset and applied some augmentation techniques to increase the dataset size. Therefore, this study is based on speech emotion recognition using a low-resourced KUI speech dataset, and the results with and without augmentation of the dataset are compared. The dataset is created using a studio platform for better-quality speech data. They are labeled using six perceived emotions: ସଡାଙ୍ଗି (angry), େରହା (happy), ଆଜି (fear), ବିକାଲି (sad), ବିଜାରି (disgust), and େଡ଼କ୍(surprise). Mel-frequency cepstral coefficient (MFCC) is used for feature extraction. The deep learning technique is an alternative to the traditional methods to recognize speech emotion. This study uses a hybrid architecture of Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) as classification techniques for recognition. The results have been compared with existing benchmark models, with the experiments demonstrating that the proposed hybrid model achieved an accuracy of 96% without augmentation and 97% with augmentation.
Faculty of Electrical Engineering, Computer Science and Information Technology Osijek
Title: Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
Description:
Emotion is fundamental to interpersonal interactions since it assists mutual understanding.
Developing human-computer interactions and a related digital product depends heavily on emotion recognition.
Due to the need for human-computer interaction applications, deep learning models for the voice recognition of emotions are an essential area of research.
Most speech emotion recognition algorithms are only deployed in European and a few Asian languages.
However, for a low-resource tribal language like KUI, the dataset is not available.
So, we created the dataset and applied some augmentation techniques to increase the dataset size.
Therefore, this study is based on speech emotion recognition using a low-resourced KUI speech dataset, and the results with and without augmentation of the dataset are compared.
The dataset is created using a studio platform for better-quality speech data.
They are labeled using six perceived emotions: ସଡାଙ୍ଗି (angry), େରହା (happy), ଆଜି (fear), ବିକାଲି (sad), ବିଜାରି (disgust), and େଡ଼କ୍(surprise).
Mel-frequency cepstral coefficient (MFCC) is used for feature extraction.
The deep learning technique is an alternative to the traditional methods to recognize speech emotion.
This study uses a hybrid architecture of Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) as classification techniques for recognition.
The results have been compared with existing benchmark models, with the experiments demonstrating that the proposed hybrid model achieved an accuracy of 96% without augmentation and 97% with augmentation.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Validation of two preoxygenation techniques, 3 min tidal volume breath and eight vital capacity breath techniques in tribal and non-tribal population of Eastern India
Validation of two preoxygenation techniques, 3 min tidal volume breath and eight vital capacity breath techniques in tribal and non-tribal population of Eastern India
Background: Preoxygenation during anesthesia can be done by 3 min tidal volume breath and eight vital capacity breath in 1 min, conventionally. Population of our country is not hom...
Validation of two preoxygenation techniques, 3 min tidal volume breath and eight vital capacity breath techniques in tribal and non-tribal population of Eastern India
Validation of two preoxygenation techniques, 3 min tidal volume breath and eight vital capacity breath techniques in tribal and non-tribal population of Eastern India
Background: Preoxygenation during anesthesia can be done by 3 min tidal volume breath and eight vital capacity breath in 1 min, conventionally. Population of our country is not hom...
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
Abstract
Objective: While eating disorders (ED) are more commonly diagnosed in females, there is growing awareness that men also experience ED and may do so in a different ...
Deep Learning-Based Feature Extraction for Speech Emotion Recognition
Deep Learning-Based Feature Extraction for Speech Emotion Recognition
Emotion recognition from speech signals is an important and challenging component of
Human-Computer Interaction. In the field of speech emotion recognition (SER), many techniques
h...
Robust speech recognition based on deep learning for sports game review
Robust speech recognition based on deep learning for sports game review
Abstract
To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...
KNOWLEDGE LEVEL OF TRIBAL AND NON-TRIBAL FARMERS ABOUT IMPROVED PRODUCTION TECHNOLOGY OF AJWAIN
KNOWLEDGE LEVEL OF TRIBAL AND NON-TRIBAL FARMERS ABOUT IMPROVED PRODUCTION TECHNOLOGY OF AJWAIN
The present study which was conducted to find out the difference in the knowledge levels of the tribal and non tribal farmers of Rajasthan about improved production technology of A...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

