Javascript must be enabled to continue!

Comparative Analysis of Spectrogram and MFCC Representations for Speech Emotion Recognition Using Machine Learning

Emotion recognition is a key area of research within human-computer interaction, addressing the growing need for systems that can respond to human emotional states. While advancements have been made, challenges remain, particularly in selecting appropriate datasets, identifying effective audio features, and optimizing classification models. This study explores how different audio feature representations, specifically Mel-Frequency Cepstral Coefficients (MFCC) and spectrograms, influence the accuracy of emotion classification. By extracting these features from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and applying Random Forest (RF) and Support Vector Machine (SVM) classifiers, the research compares the performance of each feature-classifier pairing. Results indicate that RF and SVM classifiers with MFCC features achieved 50% accuracy, while spectrogram features led to 45% and 54% accuracy, respectively. These findings suggest that simpler models, when combined with appropriate features, can offer promising performance, contributing to more responsive and adaptive human-computer interaction applications.

Centre for Research and Innovation

Rexcharles Enyinna Donatus B. L. Pal Ifeyinwa Happiness Donatus Ubadike Osichinaka Chiedu

Asian Journal of Computer Science and Technology

2024

Title: Comparative Analysis of Spectrogram and MFCC Representations for Speech Emotion Recognition Using Machine Learning

Description:

Emotion recognition is a key area of research within human-computer interaction, addressing the growing need for systems that can respond to human emotional states.

While advancements have been made, challenges remain, particularly in selecting appropriate datasets, identifying effective audio features, and optimizing classification models.

This study explores how different audio feature representations, specifically Mel-Frequency Cepstral Coefficients (MFCC) and spectrograms, influence the accuracy of emotion classification.

By extracting these features from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and applying Random Forest (RF) and Support Vector Machine (SVM) classifiers, the research compares the performance of each feature-classifier pairing.

Results indicate that RF and SVM classifiers with MFCC features achieved 50% accuracy, while spectrogram features led to 45% and 54% accuracy, respectively.

These findings suggest that simpler models, when combined with appropriate features, can offer promising performance, contributing to more responsive and adaptive human-computer interaction applications.

Back

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Fusion of Cochleogram and Mel Spectrogram Features for Deep Learning Based Speaker Recognition

Abstract Speaker recognition has crucial application in forensic science, financial areas, access control, surveillance and law enforcement. The performance of speaker reco...

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion

Abstract The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...

A Perspective Study on Speech Recognition

Emotions play an extremely important role in human mental life. It is a medium of expression of one’s perspective or one’s mental state to others. Speech Emotion Recognition (SER) ...

A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition

The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets...

Machine Learning for Non-Intrusive Speech Quality Assessment

<p><b>This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, w...

Extracting speech spectrogram of speech signal based on generalized S-transform

In speech signal processing, time-frequency analysis is commonly employed to extract the spectrogram of speech signals. While many algorithms exist to achieve this with high-qualit...

Robust speech recognition based on deep learning for sports game review

Abstract To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...

Email:
Password:

Email:

Comparative Analysis of Spectrogram and MFCC Representations for Speech Emotion Recognition Using Machine Learning

Related Results