Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Amharic Adhoc Information Retrieval System Based on Morphological Features

View through CrossRef
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic natural language processing (NLP) tools a challenging task. Amharic adhoc retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora. In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval. We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture. Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools. We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures. Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR.
Title: Amharic Adhoc Information Retrieval System Based on Morphological Features
Description:
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora.
Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic.
Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation.
This has made the development of Amharic natural language processing (NLP) tools a challenging task.
Amharic adhoc retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora.
In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval.
We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture.
Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools.
We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures.
Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR.

Related Results

Developing an audio search engine for Amharic speech web resources
Developing an audio search engine for Amharic speech web resources
Abstract While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amhar...
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Abstract Hearing-impaired people use Sign Language to communicate with each other as well as with other communities. Usually, they are unable to communicate with normal peo...
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Abstract Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are men...
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
syntax of Amharic ideophones
syntax of Amharic ideophones
This study is on Amharic ideophones, a subject that has not been described well in the syntax of Amharic. The data used for the analysis are collected from natural settings of the ...

Back to Top