Javascript must be enabled to continue!

Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.

Learning morphologically supplemented embedding spaces using cross-lingual models has become an active area of research and facilitated many research breakthroughs in various applications such as machine translation, named entity recognition, document classification, and natural language inference. However, the field has not become customary for Southern African low-resourced languages. In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER). Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana. Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study. Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings. Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages. More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.5% accuracy). Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.4% F1 score). Collectively, this study’s results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.

Wiley

Abiodun Modupe Thapelo Sindane Vukosi Marivate

2023

Title: Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.

Description:

However, the field has not become customary for Southern African low-resourced languages.

In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER).

Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana.

Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study.

Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings.

Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages.

More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.

5% accuracy).

Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.

4% F1 score).

Collectively, this study’s results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.

Back

Purpose: This study examined the effects of age and gender during three intra-oral lingual tasks (elevation, protrusion, and depression) on peak lingual pressure in healthy adults....

PERILAKU SATUAN LINGUAL -(N)ING DALAM BAHASA JAWA (LINGUAL UNIT BEHAVIOR -(N)ING IN JAVANESE LANGUANGE)

Penelitian ini berjudul Perilaku Satuan Lingual (n)ing dalam Bahasa Jawa. Teori yang digunakan dalam kajian ini ialah kategori kata dan analisis konstituen. Pengumpulan data menggu...

Ciri Morfosemantik Afiks Derivasional {Ber-} dalam Konstruksi Verba Deajektival Bahasa Indonesia

The Indonesian language has a unique grammatical construction for verbs, which is the product of a derivational process that results in a deadjectival verb (VDaj). These VDaj const...

When Word Embeddings Become Endangered

Big languages such as English and Finnish have many natural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as s...

RESEARCHING WRITTEN MONUMENTS IN THE CONTEXT OF CHANGING SCIENTIFIC PARADIGMS

The scientific paradigm of the 21st century has acquired anthropocentric drift. In modern linguistic studies, the anthropocentric approach also occupies a dominant position: the re...

Mental practice of lingual resistance and cortical plasticity in older adults: An exploratory fNIRS study

Purpose: Mental practice using motor imagery (MP) improves motor strength and coordination in the upper and lower extremities in clinical patient populations. Its ...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model

Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and l...

Email:
Password:

Email:

Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings.

Related Results