Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Developing an audio search engine for Amharic speech web resources

View through CrossRef
Abstract While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amharic. As a morphologically complex language with unique linguistic characteristics, Amharic presents significant challenges for information retrieval, particularly for speech content. Although Amharic web resources are expanding across text, speech, and video formats, speech retrieval demands specialized solutions due to three key challenges: (1) the absence of explicit word boundaries requiring accurate automatic speech recognition, (2) lack of visual context compared to video content, and (3) compounding effects of Amharic's rich morphology on transcription accuracy. These challenges are exacerbated by the proliferation of online radio broadcasts, speech reports, and news content in Amharic. This study presents a dedicated Audio Search Engine for Amharic speech web resources, addressing these challenges through four key innovations: (1) an enhanced web crawler optimized for Amharic speech content, (2) robust speech transcription pipelines, (3) efficient indexing of transcribed content, and (4) language-specific query preprocessing components. Our system leverages open-source technologies, including JSpider for crawling, Sphinx for speech recognition, and Datafari for indexing and retrieval, creating an integrated solution tailored to Amharic's linguistic characteristics. Evaluation results demonstrate the system's effectiveness, achieving 80% precision in top-10 results and 92% recall compared to baseline retrieval methods. These promising results highlight our solution's capability to handle Amharic's unique challenges while providing practical retrieval performance. The study contributes both a technical framework for Amharic speech search and insights applicable to other resource-constrained languages facing similar retrieval challenges.
Title: Developing an audio search engine for Amharic speech web resources
Description:
Abstract While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amharic.
As a morphologically complex language with unique linguistic characteristics, Amharic presents significant challenges for information retrieval, particularly for speech content.
Although Amharic web resources are expanding across text, speech, and video formats, speech retrieval demands specialized solutions due to three key challenges: (1) the absence of explicit word boundaries requiring accurate automatic speech recognition, (2) lack of visual context compared to video content, and (3) compounding effects of Amharic's rich morphology on transcription accuracy.
These challenges are exacerbated by the proliferation of online radio broadcasts, speech reports, and news content in Amharic.
This study presents a dedicated Audio Search Engine for Amharic speech web resources, addressing these challenges through four key innovations: (1) an enhanced web crawler optimized for Amharic speech content, (2) robust speech transcription pipelines, (3) efficient indexing of transcribed content, and (4) language-specific query preprocessing components.
Our system leverages open-source technologies, including JSpider for crawling, Sphinx for speech recognition, and Datafari for indexing and retrieval, creating an integrated solution tailored to Amharic's linguistic characteristics.
Evaluation results demonstrate the system's effectiveness, achieving 80% precision in top-10 results and 92% recall compared to baseline retrieval methods.
These promising results highlight our solution's capability to handle Amharic's unique challenges while providing practical retrieval performance.
The study contributes both a technical framework for Amharic speech search and insights applicable to other resource-constrained languages facing similar retrieval challenges.

Related Results

Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Abstract Hearing-impaired people use Sign Language to communicate with each other as well as with other communities. Usually, they are unable to communicate with normal peo...
Feature selection for multimodal: acoustic event detection
Feature selection for multimodal: acoustic event detection
The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Abstract Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are men...
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...

Back to Top