Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

AmhEn: Amharic-English Large Parallel Corpus for Machine Translation

View through CrossRef
Abstract Recently, using deep neural networks for machine translation (MT) tasks has received great attention. In order for these networks to learn abstract representations of the input and store them as continuous vectors, they need a lot of data. However, very few research studies have been conducted on low-resource languages like Amharic. The progress of an Amharic-English machine translation task in both directions is affected by the lack of clean, easy-to-find, and up-to-date parallel language corpora. This paper presents the first relatively large-scale Amharic-English parallel corpora (above 1.1 million) for machine translation tasks. We ran experiments with recurrent neural networks (RNN) and Transformer in various hyper-parameter settings to investigate the usability of our dataset. Additionally, we explore the effects of Amharic homophone character normalization on machine translation. We have released the dataset in both unnormalized and normalized forms. Our dataset is available in train, test, and validation split files.
Title: AmhEn: Amharic-English Large Parallel Corpus for Machine Translation
Description:
Abstract Recently, using deep neural networks for machine translation (MT) tasks has received great attention.
In order for these networks to learn abstract representations of the input and store them as continuous vectors, they need a lot of data.
However, very few research studies have been conducted on low-resource languages like Amharic.
The progress of an Amharic-English machine translation task in both directions is affected by the lack of clean, easy-to-find, and up-to-date parallel language corpora.
This paper presents the first relatively large-scale Amharic-English parallel corpora (above 1.
1 million) for machine translation tasks.
We ran experiments with recurrent neural networks (RNN) and Transformer in various hyper-parameter settings to investigate the usability of our dataset.
Additionally, we explore the effects of Amharic homophone character normalization on machine translation.
We have released the dataset in both unnormalized and normalized forms.
Our dataset is available in train, test, and validation split files.

Related Results

Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
syntax of Amharic ideophones
syntax of Amharic ideophones
This study is on Amharic ideophones, a subject that has not been described well in the syntax of Amharic. The data used for the analysis are collected from natural settings of the ...
Evaluation of an Amharic-Language translation of Continuity of Care Satisfaction Tool among Postnatal Mothers in Ethiopia
Evaluation of an Amharic-Language translation of Continuity of Care Satisfaction Tool among Postnatal Mothers in Ethiopia
Abstract Background: Beginning in the 1990s, women’s dissatisfaction with maternity services has been widely reported in the literature. However, there is a lack of consist...
English Majors’ Perceptions of Chinese-English Translation Learning and Translation Competence
English Majors’ Perceptions of Chinese-English Translation Learning and Translation Competence
Translation is an indispensable language activity and communication method. Translation from Chinese to foreign languages, especially English, is significant to tell Chinese storie...
LSTM-Based Attentional Embedding for English Machine Translation
LSTM-Based Attentional Embedding for English Machine Translation
In order to reduce the workload of manual grading and improve the efficiency of grading, a computerized intelligent grading system for English translation based on natural language...

Back to Top