Javascript must be enabled to continue!
Incremental Learning Static Word Embeddings for Low-Resource NLP
View through CrossRef
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints. Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP. Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented. With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves. The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis. In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models. The proposed model can be considered as a prospective holistic solution for low-resource NLP. Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.
Science Research Society
Title: Incremental Learning Static Word Embeddings for Low-Resource NLP
Description:
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints.
Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP.
Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented.
With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves.
The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis.
In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models.
The proposed model can be considered as a prospective holistic solution for low-resource NLP.
Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.
Related Results
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
When Word Embeddings Become Endangered
When Word Embeddings Become Endangered
Big languages such as English and Finnish have many natural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as s...
Natural language processing applications for low-resource languages
Natural language processing applications for low-resource languages
AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disp...
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
To solve the problem of constructing the frequency responses (FR) of filters on switched capacitors, which belong to the class of electronic circuits with a periodically changing s...
Assessment of Android Network Positioning as an Alternate Source for Robust PNT
Assessment of Android Network Positioning as an Alternate Source for Robust PNT
Android devices employ several methods to calculate their position. This paper’s focus is the Network Location Provider (NLP), which leverages Wi-Fi and cell tower signals via the ...
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
Background. Vitamin D has recognized immunomodulatory, anti-proliferative, and differentiation-regulating effects primarily mediated through its genomic effects via the vitamin D r...
Natural Language Processing: Basics, Challenges, and Clustering Applications
Natural Language Processing: Basics, Challenges, and Clustering Applications
Natural Language Processing (NLP) involves the use of algorithms and
models and various computational techniques to analyze, process, and generate natural
language data, including ...
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
This technical note reports an exploratory, AI-assisted in silico proof of concept implementing a “signaling first, killing later” discovery paradigm: prioritizing compounds with h...

