Javascript must be enabled to continue!

Speech enhancement augmentation for robust speech recognition in noisy environments

Abstract. The use of augmentations as a data enrichment method has become an important element in improving the performance of speech recognition systems. To work effectively in noisy conditions, augmentation is usually used to simulate the presence of background noise. However, the quality of speech recognition on samples pre-processed by noise reduction models does not increase. This paper proposes a new approach to speech data augmentation when training ASR systems, intended for their joint use with models for speech enhancement. It was based on the creation of several additional data samples containing speech samples processed by the enhancement model. The proposed approach was tested on the E-Branchformer neural network model using data from the Librispeech set. The quality of speech samples was assessed using the DNSMOS metric. By means of a 100-hour sample of clean speech samples it was shown that the proposed augmentation allows for an improvement in the WER metric of more than 4% in absolute value compared to the generally accepted approach based on adding noisy speech samples. Experiments on 960-hour data demonstrated the robustness of this approach as the training set size increased.

EDP Sciences

Rauf Nasretdinov Andrey Lependin Ilya Ilyashenko

ITM Web of Conferences

2024

Title: Speech enhancement augmentation for robust speech recognition in noisy environments

Description:

Abstract.

The use of augmentations as a data enrichment method has become an important element in improving the performance of speech recognition systems.

To work effectively in noisy conditions, augmentation is usually used to simulate the presence of background noise.

However, the quality of speech recognition on samples pre-processed by noise reduction models does not increase.

This paper proposes a new approach to speech data augmentation when training ASR systems, intended for their joint use with models for speech enhancement.

It was based on the creation of several additional data samples containing speech samples processed by the enhancement model.

The proposed approach was tested on the E-Branchformer neural network model using data from the Librispeech set.

The quality of speech samples was assessed using the DNSMOS metric.

By means of a 100-hour sample of clean speech samples it was shown that the proposed augmentation allows for an improvement in the WER metric of more than 4% in absolute value compared to the generally accepted approach based on adding noisy speech samples.

Experiments on 960-hour data demonstrated the robustness of this approach as the training set size increased.

Back

[RETRACTED]Rhino XL Reviews, NY USA: Studies show that testosterone levels in males decrease constantly with growing age. There are also many other problems that males face due ...

Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized sp...

Robust speech recognition based on deep learning for sports game review

Abstract To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

The Effectiveness of Data Augmentation for Bone Suppression in Chest Radiograph using Convolutional Neural Network

Objective: Bone suppression of chest radiograph holds great promise to improve the localization accuracy in Image-Guided Radiation Therapy (IGRT). However, data scarcity has long b...

Identifying Links Between Latent Memory and Speech Recognition Factors

Objectives: The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but i...

Discrete Wavelet Transform and Spectral Subtraction Based Speech Enhancement Algorithm for Hearing Aid Application

Abstract Hearing aids are small electronic devices intended to help those with hearing loss improve their hearing ability with the use of advanced audio signal processing t...

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recogntion

Abstract The performance of speaker recognition is very well in a clean dataset or without mismatch between training and test set. However, the performance is degraded with...

Email:
Password:

Email:

Speech enhancement augmentation for robust speech recognition in noisy environments

Related Results