Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning

View through CrossRef
Speech emotion recognition (SER) is widely applicable today, benefiting areas such as entertainment, robotics, and healthcare. This emotional understanding enhances user-machine interaction, making systems more responsive and providing more natural experiences. In robotics, SER is useful in home assistance devices, eldercare, and special education, facilitating effective communication. Additionally, in healthcare settings, it can monitor patients’ emotional well-being. However, achieving high levels of accuracy is challenging and complicated by the need to select the best combination of machine learning algorithms, hyperparameters, datasets, data augmentation, and feature extraction methods. Therefore, this study aims to develop a deep learning approach for optimal SER configurations. It delves into the domains of optimizer settings, learning rates, data augmentation techniques, feature extraction methods, and neural architectures for the RAVDESS, TESS, SAVEE, and R+T+S (RAVDESS+TESS+SAVEE) datasets. After finding the best SER configurations, meta-learning is carried out, transferring the best configurations to two additional datasets, CREMA-D and R+T+S+C (RAVDESS+TESS+SAVEE+CREMA-D). The developed approach proved effective in finding the best configurations, achieving an accuracy of 97.01% for RAVDESS, 100% for TESS, 90.62% for SAVEE, and 97.37% for R+T+S. Furthermore, using meta-learning, the CREMA-D and R+T+S+C datasets achieved accuracies of 83.28% and 90.94%, respectively.
Title: A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning
Description:
Speech emotion recognition (SER) is widely applicable today, benefiting areas such as entertainment, robotics, and healthcare.
This emotional understanding enhances user-machine interaction, making systems more responsive and providing more natural experiences.
In robotics, SER is useful in home assistance devices, eldercare, and special education, facilitating effective communication.
Additionally, in healthcare settings, it can monitor patients’ emotional well-being.
However, achieving high levels of accuracy is challenging and complicated by the need to select the best combination of machine learning algorithms, hyperparameters, datasets, data augmentation, and feature extraction methods.
Therefore, this study aims to develop a deep learning approach for optimal SER configurations.
It delves into the domains of optimizer settings, learning rates, data augmentation techniques, feature extraction methods, and neural architectures for the RAVDESS, TESS, SAVEE, and R+T+S (RAVDESS+TESS+SAVEE) datasets.
After finding the best SER configurations, meta-learning is carried out, transferring the best configurations to two additional datasets, CREMA-D and R+T+S+C (RAVDESS+TESS+SAVEE+CREMA-D).
The developed approach proved effective in finding the best configurations, achieving an accuracy of 97.
01% for RAVDESS, 100% for TESS, 90.
62% for SAVEE, and 97.
37% for R+T+S.
Furthermore, using meta-learning, the CREMA-D and R+T+S+C datasets achieved accuracies of 83.
28% and 90.
94%, respectively.

Related Results

Robust speech recognition based on deep learning for sports game review
Robust speech recognition based on deep learning for sports game review
Abstract To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
Abstract Objective: While eating disorders (ED) are more commonly diagnosed in females, there is growing awareness that men also experience ED and may do so in a different ...
Deep Learning-Based Feature Extraction for Speech Emotion Recognition
Deep Learning-Based Feature Extraction for Speech Emotion Recognition
Emotion recognition from speech signals is an important and challenging component of Human-Computer Interaction. In the field of speech emotion recognition (SER), many techniques h...
Introduction: Autonomic Psychophysiology
Introduction: Autonomic Psychophysiology
Abstract The autonomic psychophysiology of emotion has a long thought tradition in philosophy but a short empirical tradition in psychological research. Yet the past...
Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
Emotion is fundamental to interpersonal interactions since it assists mutual understanding. Developing human-computer interactions and a related digital product depends heavily on ...
Identifying Links Between Latent Memory and Speech Recognition Factors
Identifying Links Between Latent Memory and Speech Recognition Factors
Objectives: The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but i...
Meta-Representations as Representations of Processes
Meta-Representations as Representations of Processes
In this study, we explore how the notion of meta-representations in Higher-Order Theories (HOT) of consciousness can be implemented in computational models. HOT suggests that consc...

Back to Top