Javascript must be enabled to continue!
Active learning for Named Entity Recognition in Kannada
View through CrossRef
<p>Named Entity Recognition (NER) task aims at automatically recognising and classifying named entities in a given natural language input. Majority of the studies related to Named Entity Recognition are focused on English language. Named Entity Recognition in Indian languages is challenging due to the complex grammar and morphology of Indian languages and the scarcity of good quality labelled data. The dissimilarity between the literary and spoken versions of these languages is also a big challenge regarding usability of NER models. Kannada is such an Indian language for which the task of NER is still an active area of research. Usage of Deep learning, especially Transfer learning using Transformer models has drastically improved performance of many NLP tasks at scale. However, Transfer learning still requires considerable amount of data for the required task. In case of low resource language like Kannada, very few labelled datasets are available publicly and creating one from scratch is expensive in terms of time and labor. Active learning (AL) aims to tackle the labelled data acquisition problem by having the learning model and an Oracle to cooperate. Active Learning iteratively builds an optimally labelled and sufficiently large dataset. This study focuses on Named Entity Recognition in Kannada language. We explore the application of Active Learning technique to Named Entity Recognition problem in the low-resource language Kannada. Results show that AL can be used to boost a multilingual models performance in fine-tuning for NER. We also try to mitigate the gap between formal and colloquial dialects of Kannada in NER datasets.</p>
Title: Active learning for Named Entity Recognition in Kannada
Description:
<p>Named Entity Recognition (NER) task aims at automatically recognising and classifying named entities in a given natural language input.
Majority of the studies related to Named Entity Recognition are focused on English language.
Named Entity Recognition in Indian languages is challenging due to the complex grammar and morphology of Indian languages and the scarcity of good quality labelled data.
The dissimilarity between the literary and spoken versions of these languages is also a big challenge regarding usability of NER models.
Kannada is such an Indian language for which the task of NER is still an active area of research.
Usage of Deep learning, especially Transfer learning using Transformer models has drastically improved performance of many NLP tasks at scale.
However, Transfer learning still requires considerable amount of data for the required task.
In case of low resource language like Kannada, very few labelled datasets are available publicly and creating one from scratch is expensive in terms of time and labor.
Active learning (AL) aims to tackle the labelled data acquisition problem by having the learning model and an Oracle to cooperate.
Active Learning iteratively builds an optimally labelled and sufficiently large dataset.
This study focuses on Named Entity Recognition in Kannada language.
We explore the application of Active Learning technique to Named Entity Recognition problem in the low-resource language Kannada.
Results show that AL can be used to boost a multilingual models performance in fine-tuning for NER.
We also try to mitigate the gap between formal and colloquial dialects of Kannada in NER datasets.
</p>.
Related Results
Active learning for Named Entity Recognition in Kannada
Active learning for Named Entity Recognition in Kannada
Named Entity Recognition (NER) task aims at automatically recognising
and classifying named entities in a given natural language input.
Majority of the studies related to Named Ent...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Abstract
Introduction
The short half-life of standard factor VIII (FVIII) products means that frequent injections (3 to 4 times/week) are needed for e...
Pre-Trained Deep Learning Models for Detecting Strikeouts in Kannada Handwritten Documents
Pre-Trained Deep Learning Models for Detecting Strikeouts in Kannada Handwritten Documents
In the digital document analysis, handwritten character recognition is a challenging area. Various methods are proposed in the literature to identify the strike-outs in various la...
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
Abstract
Background
Myelofibrosis (MF) is a myeloproliferative neoplasm associated with progressive, debilitating symptoms that ...
Kannada translation and cross-cultural adaptation of the SarQol®: sarcopenia specific quality of life questionnaire
Kannada translation and cross-cultural adaptation of the SarQol®: sarcopenia specific quality of life questionnaire
Abstract
Background
Sarcopenia Quality of life questionnaire (SarQol) in Indian vernacular language is limited to Hindi, Marathi, and Bengali.
Objective
To translate and ...
Unsupervised entity linking using graph-based semantic similarity
Unsupervised entity linking using graph-based semantic similarity
Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as we...
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract
Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...

