Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.

View through CrossRef
Learning morphologically supplemented embedding spaces using cross-lingual models has become an active area of research and facilitated many research breakthroughs in various applications such as machine translation, named entity recognition, document classification, and natural language inference. However, the field has not become customary for Southern African low-resourced languages. In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER). Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana. Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study. Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings. Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages. More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.5% accuracy). Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.4% F1 score). Collectively, this study’s results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.
Title: Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.
Description:
Learning morphologically supplemented embedding spaces using cross-lingual models has become an active area of research and facilitated many research breakthroughs in various applications such as machine translation, named entity recognition, document classification, and natural language inference.
However, the field has not become customary for Southern African low-resourced languages.
In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER).
Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana.
Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study.
Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings.
Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages.
More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.
5% accuracy).
Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.
4% F1 score).
Collectively, this study’s results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.

Related Results

Effects of Age and Gender during Three Lingual Tasks on Peak Lingual Pressures in Healthy Adults
Effects of Age and Gender during Three Lingual Tasks on Peak Lingual Pressures in Healthy Adults
Purpose: This study examined the effects of age and gender during three intra-oral lingual tasks (elevation, protrusion, and depression) on peak lingual pressure in healthy adults....
PERILAKU SATUAN LINGUAL -(N)ING DALAM BAHASA JAWA (LINGUAL UNIT BEHAVIOR -(N)ING IN JAVANESE LANGUANGE)
PERILAKU SATUAN LINGUAL -(N)ING DALAM BAHASA JAWA (LINGUAL UNIT BEHAVIOR -(N)ING IN JAVANESE LANGUANGE)
Penelitian ini berjudul Perilaku Satuan Lingual (n)ing dalam Bahasa Jawa. Teori yang digunakan dalam kajian ini ialah kategori kata dan analisis konstituen. Pengumpulan data menggu...
Ciri Morfosemantik Afiks Derivasional {Ber-} dalam Konstruksi Verba Deajektival Bahasa Indonesia
Ciri Morfosemantik Afiks Derivasional {Ber-} dalam Konstruksi Verba Deajektival Bahasa Indonesia
The Indonesian language has a unique grammatical construction for verbs, which is the product of a derivational process that results in a deadjectival verb (VDaj). These VDaj const...
RESEARCHING WRITTEN MONUMENTS IN THE CONTEXT OF CHANGING SCIENTIFIC PARADIGMS
RESEARCHING WRITTEN MONUMENTS IN THE CONTEXT OF CHANGING SCIENTIFIC PARADIGMS
The scientific paradigm of the 21st century has acquired anthropocentric drift. In modern linguistic studies, the anthropocentric approach also occupies a dominant position: the re...
Mental practice of lingual resistance and cortical plasticity in older adults: An exploratory fNIRS study
Mental practice of lingual resistance and cortical plasticity in older adults: An exploratory fNIRS study
Purpose: Mental practice using motor imagery (MP) improves motor strength and coordination in the upper and lower extremities in clinical patient populations. Its ...
When Word Embeddings Become Endangered
When Word Embeddings Become Endangered
Big languages such as English and Finnish have many natural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as s...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and l...

Back to Top