Javascript must be enabled to continue!
Effects of Data Augmentations on Speech Emotion Recognition
View through CrossRef
Data augmentation techniques recently gained more adoption in speech processing, including speech emotion recognition. Although more data tends to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition. The experiments are conducted on the Japanese Twitter-based emotional speech corpus. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentation and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific application.
Title: Effects of Data Augmentations on Speech Emotion Recognition
Description:
Data augmentation techniques recently gained more adoption in speech processing, including speech emotion recognition.
Although more data tends to be more effective, there may be a trade-off in which more data will not provide a better model.
This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition.
The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition.
The experiments are conducted on the Japanese Twitter-based emotional speech corpus.
The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques.
For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances.
The results highlight the trade-off between the number of data augmentation and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific application.
Related Results
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND
Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
Effects of Data Augmentations on Speech Emotion Recognition
Effects of Data Augmentations on Speech Emotion Recognition
Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may ...
The impact of binge drinking on emotion recognition
The impact of binge drinking on emotion recognition
Binge drinking or heavy episodic drinking is variously defined but according to the World Health Organisation (WHO) it is the consumption of at least 60 grams or more of pure alcoh...
Studies on visual emotion understanding
Studies on visual emotion understanding
As information explodes nowadays, visual data has become a crucial information carrier in various fields: social networks, e-commerce, online entertainment, etc. Visual emotion ana...
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
What about males? Exploring sex differences in the relationship between emotion difficulties and eating disorders
Abstract
Objective: While eating disorders (ED) are more commonly diagnosed in females, there is growing awareness that men also experience ED and may do so in a different ...
AI-Based Emotion Recognition in Education: Progress, Applications, and Open Challenges
AI-Based Emotion Recognition in Education: Progress, Applications, and Open Challenges
AI-based emotion recognition has emerged as a critical component of affect-aware educational technologies, particularly in online, large-scale, and technology-mediated learning env...
Introduction: Autonomic Psychophysiology
Introduction: Autonomic Psychophysiology
Abstract
The autonomic psychophysiology of emotion has a long thought tradition in philosophy but a short empirical tradition in psychological research. Yet the past...
Curriculum Multi-Negative Augmentation for Debiased Video Grounding
Curriculum Multi-Negative Augmentation for Debiased Video Grounding
Video Grounding (VG) aims to locate the desired segment from a video given a sentence query. Recent studies have found that current VG models are prone to over-rely the groundtruth...

