Javascript must be enabled to continue!
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
View through CrossRef
Abstract
Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are mentions. The task of coreference resolution is clustering all similar mentions in a text based on the index of a word. Coreference resolution is used for several NLP applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness. In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT). This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words. The proposed system model has training and testing phase. The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction and coref model. Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention. The use of word embedding in the proposed model is that it represent each word into a low dimension vector. It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text. Necessary informations are extracted from word embedding and processed data as well as Amharic characters. After we extract important features from training data we build a coreference model. Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word. To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study. The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC. Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.71%, 90.9%, 88.86% and 81.7% F-measure values respectively on the Amharic dataset.
Title: Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Description:
Abstract
Coreference resolution is the process of finding an entity which is refers to the same entity in a text.
In coreference resolution similar entities are mentions.
The task of coreference resolution is clustering all similar mentions in a text based on the index of a word.
Coreference resolution is used for several NLP applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness.
In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT).
This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words.
The proposed system model has training and testing phase.
The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction and coref model.
Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention.
The use of word embedding in the proposed model is that it represent each word into a low dimension vector.
It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text.
Necessary informations are extracted from word embedding and processed data as well as Amharic characters.
After we extract important features from training data we build a coreference model.
Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word.
To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study.
The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC.
Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.
71%, 90.
9%, 88.
86% and 81.
7% F-measure values respectively on the Amharic dataset.
Related Results
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction
Within-Document Arabic Event Coreference: Challenges, Datasets, Approaches and Future Direction
Event coreference resolution is a crucial component in Natural Language Processing (NLP) applications as it directly affects text summarization, machine translation, classification...
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
BACKGROUND
Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
Abstract
Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...
Performance Study on Extractive Text Summarization Using BERT Models
Performance Study on Extractive Text Summarization Using BERT Models
The task of summarization can be categorized into two methods, extractive and abstractive. Extractive summarization selects the salient sentences from the original document to form...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

