Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Incremental Learning Static Word Embeddings for Low-Resource NLP

View through CrossRef
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints. Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP. Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented. With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves. The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis. In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models. The proposed model can be considered as a prospective holistic solution for low-resource NLP. Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.
Title: Incremental Learning Static Word Embeddings for Low-Resource NLP
Description:
Natural Language Processing (NLP) development for Low-Resource Languages (LRL) remains challenging due to limited data availability, linguistic diversity, and computational constraints.
Many NLP solutions rely on complex models and high volume/quality data, which makes them difficult to use in Low-Resource NLP.
Inspired by the NLP challenges and insights revealed by various previous works, the underexplored Incremental Learning (IL) Static Word Embedding (SWE) system in the low-resource NLP case of Indonesia’s local languages is proposed and presented.
With basic-level models and hyperparameter sweeps, these models are tested in the scenario of incrementally incorporating 10 different local languages into themselves.
The simulations indicate this type of model resists Catastrophic Forgetting (CF) very well and delivers competitive performance on the downstream task of sentiment analysis.
In terms of f1 scores, the proposed model succeeds to exceed other baseline models and even rival heavy Transformer models.
The proposed model can be considered as a prospective holistic solution for low-resource NLP.
Future works could explore this model’s behavior in finer-grained NLP tasks, different IL settings, or test more advanced models.

Related Results

Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
Natural language processing applications for low-resource languages
Natural language processing applications for low-resource languages
AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disp...
Natural Language Processing: Basics, Challenges, and Clustering Applications
Natural Language Processing: Basics, Challenges, and Clustering Applications
Natural Language Processing (NLP) involves the use of algorithms and models and various computational techniques to analyze, process, and generate natural language data, including ...
Computational linguistics at the crossroads: A comprehensive review of NLP advancements
Computational linguistics at the crossroads: A comprehensive review of NLP advancements
New NLP breakthroughs have put Computational Linguistics at a crossroads. NLP's past, present, and future are covered. This review explains computational linguistics' creation with...
Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili
Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which ...
Development and research of a neural network alternate incremental learning algorithm
Development and research of a neural network alternate incremental learning algorithm
In this paper, the relevance of developing methods and algorithms for neural network incremental learning is shown. Families of incremental learning techniques are presented. A pos...
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
With the exponential growth in the speed and volume of remote sensing data, deep learning models are expected to adapt and continually learn over time. Unfortunately, the domain sh...
Query-Based Retrieval Using Universal Sentence Encoder
Query-Based Retrieval Using Universal Sentence Encoder
In Natural language processing, various tasks can be implemented with the features provided by word embeddings. But for obtaining embeddings for larger chunks like sentences, the e...

Back to Top