Javascript must be enabled to continue!
An ensemble technique for speech recognition in noisy environments
View through CrossRef
<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system. Thus, the best among these texts can be elected as final ASR output. The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER. The test results were encouraging and showed a relatively decreased by 16.61% and 11.54% on CER and WER compared with the best current technique. ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.</span>
Institute of Advanced Engineering and Science
Title: An ensemble technique for speech recognition in noisy environments
Description:
<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text.
ASR systems often produce poor accuracy for the noisy speech signal.
Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy.
The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters.
The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system.
Thus, the best among these texts can be elected as final ASR output.
The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER.
The test results were encouraging and showed a relatively decreased by 16.
61% and 11.
54% on CER and WER compared with the best current technique.
ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.
</span>.
Related Results
Robust speech recognition based on deep learning for sports game review
Robust speech recognition based on deep learning for sports game review
Abstract
To verify the feasibility of robust speech recognition based on deep learning in sports game review. In this paper, a robust speech recognition model is bui...
Reference-Based Speech Enhancement via Feature Alignment and Fusion Network
Reference-Based Speech Enhancement via Feature Alignment and Fusion Network
Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized sp...
Identifying Links Between Latent Memory and Speech Recognition Factors
Identifying Links Between Latent Memory and Speech Recognition Factors
Objectives:
The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but i...
The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study
The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study
Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. ...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech enhancement augmentation for robust speech recognition in noisy environments
Speech enhancement augmentation for robust speech recognition in noisy environments
Abstract. The use of augmentations as a data enrichment method has become an important element in improving the performance of speech recognition systems. To work effectively in no...

