Javascript must be enabled to continue!
DNN-based Multilingual Acoustic Modeling for Four Ethiopian Languages
View through CrossRef
In this paper, we present the results of experiments conducted on multilingual acoustic modeling in the development of an Automatic Speech Recognition (ASR) system using speech data of phonetically much related Ethiopian languages (Amharic, Tigrigna, Oromo and Wolaytta) with multilingual (ML) mix and multitask approaches. The use of speech data from only phonetically much related languages brought improvement over results reported in a previous work that used 26 languages (including the four languages). A maximum Word Error Rate (WER) reduction from 25.03% (in the previous work) to 21.52% has been achieved for Wolaytta, which is a relative WER reduction of 14.02%. As a result of using multilingual acoustic modeling for the development of an automatic speech recognition (ASR) system, a relative WER reduction of up to 7.36% (a WER reduction from 23.23% to 21.52%) has been achieved over a monolingual ASR. Compared to the ML mix, the multitask approach brought a better performance improvement (a relative WERs reduction of up to 5.9%). Experiments have also been conducted using Amharic and Tigrigna in a pair and Oromo and Wolaytta in another pair. The results of the experiments showed that languages with a relatively better language resources for lexical and language modeling (Amharic and Tigrigna) benefited from the use of speech data from only two languages. Generally, the findings show that the use of speech corpora of phonetically related languages with the multitask multilingual modeling approach for the development of ASR systems for less-resourced languages is a promising solution.
African Journals Online (AJOL)
Title: DNN-based Multilingual Acoustic Modeling for Four Ethiopian Languages
Description:
In this paper, we present the results of experiments conducted on multilingual acoustic modeling in the development of an Automatic Speech Recognition (ASR) system using speech data of phonetically much related Ethiopian languages (Amharic, Tigrigna, Oromo and Wolaytta) with multilingual (ML) mix and multitask approaches.
The use of speech data from only phonetically much related languages brought improvement over results reported in a previous work that used 26 languages (including the four languages).
A maximum Word Error Rate (WER) reduction from 25.
03% (in the previous work) to 21.
52% has been achieved for Wolaytta, which is a relative WER reduction of 14.
02%.
As a result of using multilingual acoustic modeling for the development of an automatic speech recognition (ASR) system, a relative WER reduction of up to 7.
36% (a WER reduction from 23.
23% to 21.
52%) has been achieved over a monolingual ASR.
Compared to the ML mix, the multitask approach brought a better performance improvement (a relative WERs reduction of up to 5.
9%).
Experiments have also been conducted using Amharic and Tigrigna in a pair and Oromo and Wolaytta in another pair.
The results of the experiments showed that languages with a relatively better language resources for lexical and language modeling (Amharic and Tigrigna) benefited from the use of speech data from only two languages.
Generally, the findings show that the use of speech corpora of phonetically related languages with the multitask multilingual modeling approach for the development of ASR systems for less-resourced languages is a promising solution.
Related Results
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Background and Objectives: As English is a stress-timed language, lexical stress plays an important role in the perception and processing of speech by native speakers. Incorrect st...
Multi-Model Ensemble Depth Adaptive Deep Neural Network for Crop Yield Prediction
Multi-Model Ensemble Depth Adaptive Deep Neural Network for Crop Yield Prediction
Accurate prediction of crop yield enables critical tasks such as identifying the optimum crop profile for planting, assigning government resources and decision-making on imports an...
Daily to Sub-daily precipitation downscaling based on multiple datasets using artificial neural networks in Brazil
Daily to Sub-daily precipitation downscaling based on multiple datasets using artificial neural networks in Brazil
<p><span>Precipitation is an extremely important variable for society. While intense and persistent rainfall are responsible for causing floods and land...
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
The importance of mathematics in the modern society is overwhelming. The importance of mathematics has long been recognized all over the world, and that is why all students are req...
Moving towards (new) multilingual paradigms
Moving towards (new) multilingual paradigms
Abstract
Multilingual education is increasingly perceived as a desirable goal in a world where global networks play a significant role. Crucially, educating multilin...
Predict the Risk of Dyslipidemia via Deep Neural Networks for Survival Data
Predict the Risk of Dyslipidemia via Deep Neural Networks for Survival Data
Abstract
Background: Dyslipidemia is an important risk factor for coronary artery disease and stroke. Early detection and prevention of dyslipidemia can markedly alter card...
Kra-Dai Languages
Kra-Dai Languages
Kra-Dai (also called Tai-Kadai and Kam-Tai) is a family of approximately 100 languages spoken in Southeast Asia, extending from the island of Hainan, China, in the east to the Indi...
Applications Of Acoustic Image Logs
Applications Of Acoustic Image Logs
Abstract
Acoustic image logs have been acquired in the Barua/Motatan and Mara fields as a part of the information acquisition program implemented by Maraven, S.A....

