Javascript must be enabled to continue!

Incremental Learning Static Word Embeddings for Low-Resource NLP

Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints. Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP. Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented. With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves. The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis. In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models. The proposed model can be considered as a prospective holistic solution for low-resource NLP. Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.

Science Research Society

Nathan J. Lee

Journal of Information Systems Engineering and Management

2025

Title: Incremental Learning Static Word Embeddings for Low-Resource NLP

Description:

Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints.

Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP.

Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented.

With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves.

The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis.

In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models.

The proposed model can be considered as a prospective holistic solution for low-resource NLP.

Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.

Back

Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...

Natural language processing applications for low-resource languages

AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disp...

Natural Language Processing: Basics, Challenges, and Clustering Applications

Natural Language Processing (NLP) involves the use of algorithms and models and various computational techniques to analyze, process, and generate natural language data, including ...

Computational linguistics at the crossroads: A comprehensive review of NLP advancements

New NLP breakthroughs have put Computational Linguistics at a crossroads. NLP's past, present, and future are covered. This review explains computational linguistics' creation with...

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which ...

Development and research of a neural network alternate incremental learning algorithm

In this paper, the relevance of developing methods and algorithms for neural network incremental learning is shown. Families of incremental learning techniques are presented. A pos...

DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data

With the exponential growth in the speed and volume of remote sensing data, deep learning models are expected to adapt and continually learn over time. Unfortunately, the domain sh...

Query-Based Retrieval Using Universal Sentence Encoder

In Natural language processing, various tasks can be implemented with the features provided by word embeddings. But for obtaining embeddings for larger chunks like sentences, the e...

Email:
Password:

Email:

Incremental Learning Static Word Embeddings for Low-Resource NLP

Related Results