Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)

View through CrossRef
Abstract Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are mentions. The task of coreference resolution is clustering all similar mentions in a text based on the index of a word. Coreference resolution is used for several NLP applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness. In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT). This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words. The proposed system model has training and testing phase. The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction and coref model. Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention. The use of word embedding in the proposed model is that it represent each word into a low dimension vector. It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text. Necessary informations are extracted from word embedding and processed data as well as Amharic characters. After we extract important features from training data we build a coreference model. Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word. To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study. The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC. Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.71%, 90.9%, 88.86% and 81.7% F-measure values respectively on the Amharic dataset.
Springer Science and Business Media LLC
Title: Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Description:
Abstract Coreference resolution is the process of finding an entity which is refers to the same entity in a text.
In coreference resolution similar entities are mentions.
The task of coreference resolution is clustering all similar mentions in a text based on the index of a word.
Coreference resolution is used for several NLP applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness.
In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT).
This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words.
The proposed system model has training and testing phase.
The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction and coref model.
Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention.
The use of word embedding in the proposed model is that it represent each word into a low dimension vector.
It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text.
Necessary informations are extracted from word embedding and processed data as well as Amharic characters.
After we extract important features from training data we build a coreference model.
Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word.
To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study.
The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC.
Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.
71%, 90.
9%, 88.
86% and 81.
7% F-measure values respectively on the Amharic dataset.

Related Results

Analyse en corpus de chaînes de coréférence : la coréférence non-stricte à l'épreuve de la linguistique outillée
Analyse en corpus de chaînes de coréférence : la coréférence non-stricte à l'épreuve de la linguistique outillée
Une chaîne de coréférence désigne l'ensemble des expressions linguistiques qui réfèrent à la même entité. La relation de coréférence entre les « maillons » d'une chaîne implique qu...
Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction
Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction
Event coreference resolution is a crucial component in Natural Language Processing (NLP) applications as it directly affects text summarization, machine translation, classification...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Developing an audio search engine for Amharic speech web resources
Developing an audio search engine for Amharic speech web resources
Abstract While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amhar...
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...

Back to Top