Javascript must be enabled to continue!
Incremental Learning Static Word Embeddings for Low-Resource NLP
View through CrossRef
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints. Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP. Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented. With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves. The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis. In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models. The proposed model can be considered as a prospective holistic solution for low-resource NLP. Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.
Science Research Society
Title: Incremental Learning Static Word Embeddings for Low-Resource NLP
Description:
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints.
Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP.
Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented.
With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves.
The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis.
In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models.
The proposed model can be considered as a prospective holistic solution for low-resource NLP.
Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.
Related Results
AI and Incidental Findings
AI and Incidental Findings
Photo by Accuray on Unsplash
INTRODUCTION
Delayed and missed follow-up on incidental findings threatens patient health and is a major financial risk for healthcare systems. The hea...
Advancements in Word Embeddings: A Comprehensive Survey and Analysis
Advancements in Word Embeddings: A Comprehensive Survey and Analysis
In recent years, the field of Natural Language Processing (NLP) has seen significant growth in the study of word representation, with word embeddings proving valuable for various N...
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
When Word Embeddings Become Endangered
When Word Embeddings Become Endangered
Big languages such as English and Finnish have many natural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as s...
Exploring Word Embeddings for Text Classification: A Comparative Analysis
Exploring Word Embeddings for Text Classification: A Comparative Analysis
For language tasks like text classification and sequence labeling, word embeddings are essential for providing input characteristics in deep models. There have been many word embed...
Exploring the Privacy-Preserving Properties of Word Embeddings: Algorithmic Validation Study (Preprint)
Exploring the Privacy-Preserving Properties of Word Embeddings: Algorithmic Validation Study (Preprint)
BACKGROUND
Word embeddings are dense numeric vectors used to represent language in neural networks. Until recently, there had been no publicly released embe...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Natural Language Processing Applications in Mechanical Engineering Education
Natural Language Processing Applications in Mechanical Engineering Education
Abstract
NLP, or Natural Language Processing, is a branch of artificial intelligence, enabling machines to understand and respond to human language in both written a...

