Javascript must be enabled to continue!
Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture
View through CrossRef
Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP), focusing on identifying and extracting entities such as names, locations, organizations, and other specific labels from unstructured text data. It plays a crucial role in various NLP applications, including information retrieval, question answering, and sentiment analysis. However, while NER systems have been extensively developed for English, adapting them to languages like Urdu poses unique challenges due to linguistic differences and the scarcity of annotated data. In this research, we enhance data diversity and accessibility for Urdu NER by introducing the ZUNERA
corpus
, the most extensive Urdu NER dataset to date, comprising 1,189,614 tokens and 89,804 named entities. Additionally, we classify the entities into twenty-three different named entities types. We meticulously annotate the
corpus
, providing clear guidelines and employing the Kappa coefficient to ensure high-quality annotations. Furthermore, we propose the Urdu-Named Entity Recognition with BiGRU-based Deep Learning Architecture (NERD) framework, which facilitates efficient entity recognition in Urdu text. The proposed framework achieves an impressive F1-score of 94.6%. Comparing ZUNERA with the MK-PUCIT dataset underscores its robustness in accurately recognizing entities. Although this study centers on Urdu, the proposed NER framework and annotation pipeline are designed to be language-agnostic. They can be extended to other morphologically rich or low-resource languages, providing a replicable foundation for future cross-lingual research. Overall, our contributions significantly advance Urdu NER research by providing a comprehensive dataset, evaluating state-of-the-art techniques, and introducing a novel framework for efficient Urdu entity recognition.
Title: Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture
Description:
Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP), focusing on identifying and extracting entities such as names, locations, organizations, and other specific labels from unstructured text data.
It plays a crucial role in various NLP applications, including information retrieval, question answering, and sentiment analysis.
However, while NER systems have been extensively developed for English, adapting them to languages like Urdu poses unique challenges due to linguistic differences and the scarcity of annotated data.
In this research, we enhance data diversity and accessibility for Urdu NER by introducing the ZUNERA
corpus
, the most extensive Urdu NER dataset to date, comprising 1,189,614 tokens and 89,804 named entities.
Additionally, we classify the entities into twenty-three different named entities types.
We meticulously annotate the
corpus
, providing clear guidelines and employing the Kappa coefficient to ensure high-quality annotations.
Furthermore, we propose the Urdu-Named Entity Recognition with BiGRU-based Deep Learning Architecture (NERD) framework, which facilitates efficient entity recognition in Urdu text.
The proposed framework achieves an impressive F1-score of 94.
6%.
Comparing ZUNERA with the MK-PUCIT dataset underscores its robustness in accurately recognizing entities.
Although this study centers on Urdu, the proposed NER framework and annotation pipeline are designed to be language-agnostic.
They can be extended to other morphologically rich or low-resource languages, providing a replicable foundation for future cross-lingual research.
Overall, our contributions significantly advance Urdu NER research by providing a comprehensive dataset, evaluating state-of-the-art techniques, and introducing a novel framework for efficient Urdu entity recognition.
Related Results
Computational Comparative Analysis of Esophageal Microbial Communities and Metabolic Profiles in Erosive and Nonerosive Phenotypes of Gastroesophageal Reflux Disease
Computational Comparative Analysis of Esophageal Microbial Communities and Metabolic Profiles in Erosive and Nonerosive Phenotypes of Gastroesophageal Reflux Disease
Gastroesophageal reflux disease (GERD) manifests in distinct phenotypes, including erosive reflux disease (ERD) and nonerosive reflux disease (NERD), which exhibit differences in p...
Significant wave height prediction based on the GVSAO-CNN-BiGRU-SA model
Significant wave height prediction based on the GVSAO-CNN-BiGRU-SA model
Abstract
To improve the accuracy and robustness of significant wave height prediction under complex marine conditions, a multi-strategy Snow Ablation Optimization (GVSAO) m...
Efficacy and Safety of the Chinese Herbal Formula Hewei Jiangni Recipe for NERD With Cold-heat Complex Syndrome: Study Protocol for a Double-blinded Randomized Controlled Trial
Efficacy and Safety of the Chinese Herbal Formula Hewei Jiangni Recipe for NERD With Cold-heat Complex Syndrome: Study Protocol for a Double-blinded Randomized Controlled Trial
Abstract
Background Proton pump inhibitor (PPI) is effective for the treatment of non-erosive gastroesophageal reflux (NERD ), but long-term use of PPI is prone to have com...
The Application of BiGRU-MSTA Based on Multi-Scale Temporal Attention Mechanism in Predicting the Remaining Life of Lithium-Ion Batteries
The Application of BiGRU-MSTA Based on Multi-Scale Temporal Attention Mechanism in Predicting the Remaining Life of Lithium-Ion Batteries
Lithium-ion batteries are an indispensable component of numerous contemporary applications, such as electric vehicles and renewable energy systems. However, accurately predicting t...
Kohout, Annekathrin (2022). Nerds. Eine Popkulturgeschichte. München: C. H. Beck Verlag. 272 S., 16,95 €.
Kohout, Annekathrin (2022). Nerds. Eine Popkulturgeschichte. München: C. H. Beck Verlag. 272 S., 16,95 €.
Hornbrille, pickeliges Gesicht, Computergenie, männlich, unsozial und unsportlich – mit diesen Klischees wird der Nerd häufig assoziiert. Doch es braucht mehr als nur viele Stunden...
The architecture of differences
The architecture of differences
Following in the footsteps of the protagonists of the Italian architectural debate is a mark of culture and proactivity. The synthesis deriving from the artistic-humanistic factors...
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Abstract
Introduction
The short half-life of standard factor VIII (FVIII) products means that frequent injections (3 to 4 times/week) are needed for e...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

