Javascript must be enabled to continue!
Developing an audio search engine for Amharic speech web resources
View through CrossRef
Abstract
While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amharic. As a morphologically complex language with unique linguistic characteristics, Amharic presents significant challenges for information retrieval, particularly for speech content. Although Amharic web resources are expanding across text, speech, and video formats, speech retrieval demands specialized solutions due to three key challenges: (1) the absence of explicit word boundaries requiring accurate automatic speech recognition, (2) lack of visual context compared to video content, and (3) compounding effects of Amharic's rich morphology on transcription accuracy. These challenges are exacerbated by the proliferation of online radio broadcasts, speech reports, and news content in Amharic. This study presents a dedicated Audio Search Engine for Amharic speech web resources, addressing these challenges through four key innovations: (1) an enhanced web crawler optimized for Amharic speech content, (2) robust speech transcription pipelines, (3) efficient indexing of transcribed content, and (4) language-specific query preprocessing components. Our system leverages open-source technologies, including JSpider for crawling, Sphinx for speech recognition, and Datafari for indexing and retrieval, creating an integrated solution tailored to Amharic's linguistic characteristics. Evaluation results demonstrate the system's effectiveness, achieving 80% precision in top-10 results and 92% recall compared to baseline retrieval methods. These promising results highlight our solution's capability to handle Amharic's unique challenges while providing practical retrieval performance. The study contributes both a technical framework for Amharic speech search and insights applicable to other resource-constrained languages facing similar retrieval challenges.
Springer Science and Business Media LLC
Title: Developing an audio search engine for Amharic speech web resources
Description:
Abstract
While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amharic.
As a morphologically complex language with unique linguistic characteristics, Amharic presents significant challenges for information retrieval, particularly for speech content.
Although Amharic web resources are expanding across text, speech, and video formats, speech retrieval demands specialized solutions due to three key challenges: (1) the absence of explicit word boundaries requiring accurate automatic speech recognition, (2) lack of visual context compared to video content, and (3) compounding effects of Amharic's rich morphology on transcription accuracy.
These challenges are exacerbated by the proliferation of online radio broadcasts, speech reports, and news content in Amharic.
This study presents a dedicated Audio Search Engine for Amharic speech web resources, addressing these challenges through four key innovations: (1) an enhanced web crawler optimized for Amharic speech content, (2) robust speech transcription pipelines, (3) efficient indexing of transcribed content, and (4) language-specific query preprocessing components.
Our system leverages open-source technologies, including JSpider for crawling, Sphinx for speech recognition, and Datafari for indexing and retrieval, creating an integrated solution tailored to Amharic's linguistic characteristics.
Evaluation results demonstrate the system's effectiveness, achieving 80% precision in top-10 results and 92% recall compared to baseline retrieval methods.
These promising results highlight our solution's capability to handle Amharic's unique challenges while providing practical retrieval performance.
The study contributes both a technical framework for Amharic speech search and insights applicable to other resource-constrained languages facing similar retrieval challenges.
Related Results
Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Abstract
Hearing-impaired people use Sign Language to communicate with each other as well as with other communities. Usually, they are unable to communicate with normal peo...
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Purpose
Pelvic Floor Disorders (PFDs) affects many women and have a significant impact on their quality of life. Pelvic Floor Impact Questionnaire (PFIQ-7) and ...
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Abstract
Purpose
Pelvic Floor Disorders (PFDs) affects many women and have a significant impact on their quality of life. Pelvi...
Feature selection for multimodal: acoustic event detection
Feature selection for multimodal: acoustic event detection
The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Abstract
Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are men...
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Abstract
Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...

