Javascript must be enabled to continue!
Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili
View through CrossRef
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.
Title: Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili
Description:
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role.
However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa.
This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages.
Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network.
The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.
229 perplexity value) and the medium dataset (45.
859 perplexity value), outperforming character-aware language models.
We further evaluated the word embeddings using word analogy task.
To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors.
Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.
Related Results
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Means of realization of language game the Swahili language paroemias
Means of realization of language game the Swahili language paroemias
The object of this article is the language game in its various manifestations. The subject of this research is the proverbs and sayings of the Swahili language, in which the author...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
To solve the problem of constructing the frequency responses (FR) of filters on switched capacitors, which belong to the class of electronic circuits with a periodically changing s...
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
Background. Vitamin D has recognized immunomodulatory, anti-proliferative, and differentiation-regulating effects primarily mediated through its genomic effects via the vitamin D r...
Word Replaceability Through Word Vectors
Word Replaceability Through Word Vectors
AbstractThere have been many numerical methods developed recently that try to capture the semantic meaning of words through word vectors. In this study, we present a new way to lea...
On the history of studying proverbs in the Swahili language
On the history of studying proverbs in the Swahili language
The subject of this research is the African paremiology. The object is the history of studying proverbs in the Swahili language. The author examines the chronology of studying this...
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
This technical note reports an exploratory, AI-assisted in silico proof of concept implementing a “signaling first, killing later” discovery paradigm: prioritizing compounds with h...

