Javascript must be enabled to continue!
Amharic Adhoc Information Retrieval System Based on Morphological Features
View through CrossRef
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic natural language processing (NLP) tools a challenging task. Amharic adhoc retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora. In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval. We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture. Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools. We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures. Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR.
Title: Amharic Adhoc Information Retrieval System Based on Morphological Features
Description:
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora.
Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic.
Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation.
This has made the development of Amharic natural language processing (NLP) tools a challenging task.
Amharic adhoc retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora.
In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval.
We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture.
Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools.
We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures.
Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR.
Related Results
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
syntax of Amharic ideophones
syntax of Amharic ideophones
This study is on Amharic ideophones, a subject that has not been described well in the syntax of Amharic. The data used for the analysis are collected from natural settings of the ...
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
Evaluation of an Amharic-Language translation of Continuity of Care Satisfaction Tool among Postnatal Mothers in Ethiopia
Evaluation of an Amharic-Language translation of Continuity of Care Satisfaction Tool among Postnatal Mothers in Ethiopia
Abstract
Background: Beginning in the 1990s, women’s dissatisfaction with maternity services has been widely reported in the literature. However, there is a lack of consist...
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
<>Retrieving remote sensing images plays a key role in RS fields, which activates researchers to design a highly effective extraction method of image high-level features. How...
Graph-based Interactive Bibliographic Information Retrieval Systems
Graph-based Interactive Bibliographic Information Retrieval Systems
In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...
Improving Sentence Retrieval Using Sequence Similarity
Improving Sentence Retrieval Using Sequence Similarity
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or nove...
New Research Progress in Image Retrieval
New Research Progress in Image Retrieval
Image retrieval is generally divided into two categories: one is text-based Image Retrieval; another is content-based Image Retrieval. Early image retrieval technology is mainly ba...

