Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Language Independent and Multilingual Language Identification using Infinity Ngram Approach

View through CrossRef
Now days it is possible to get massive amount of multilingual digital information that are generated, propagated, exchanged, stored and accessed through the web each day across the world. Such accumulation of multilingual digital data becomes an obstacle for information acquisition. In order to tackling such difficulty language identification is the first step among many steps that are used for information acquisition. Language identification is the process of labeling given text content into corresponding language category. In past decades research works have been done in the area of language identification. However, there are issues which are not solved until: multilingual language identification, discriminating language category of very closely related languages documents and labelling the language category for very short texts like words or phrases. In this investigation, we propose an approach which able to eradicate unsolved issues of language identification (i.e. multilingual and very short texts language identification) without language barrier. In order to attain this we adopt an approach of that uses all character ngram features of given text unit (i.e. word, phrase or etc). Moreover, the proposed approach has a capability of identify the language of a text at any text unit (i.e. word, phrase, sentence or document) in both monolingual and multilingual setting. The reason behind this capability of proposed approach is due to adopting word level features, in which every words need to be classify with regard to its language category. The infinity ngram approach uses all character ngrams of text unit together in order to label the language category of given text per word level. In order to observe the effectiveness of the proposed approach four experimental techniques are conducted: pure infinity character ngram, infinity ngram with location feature and infinity ngram with sentence and document level reformulation. The experimental result indicates that an infinity ngram with location feature and along with sentence and document level reformulation achieves a promising result, which is an average F-measure of 100% at word, phrase, sentence, document level in monolingual setting. As well, for multilingual setting also attains an average F-measure of 100% for both sentence and document level, but for phrase level achieves 84.33%, 88.95% and 90.19% For Amharic, Geeze and Tigrigna respectively. Beside this, at word level achieves 83.16%, 80.96% and 85.85% for Amharic, Geeze, and Tigrigna respectively.
Title: Language Independent and Multilingual Language Identification using Infinity Ngram Approach
Description:
Now days it is possible to get massive amount of multilingual digital information that are generated, propagated, exchanged, stored and accessed through the web each day across the world.
Such accumulation of multilingual digital data becomes an obstacle for information acquisition.
In order to tackling such difficulty language identification is the first step among many steps that are used for information acquisition.
Language identification is the process of labeling given text content into corresponding language category.
In past decades research works have been done in the area of language identification.
However, there are issues which are not solved until: multilingual language identification, discriminating language category of very closely related languages documents and labelling the language category for very short texts like words or phrases.
In this investigation, we propose an approach which able to eradicate unsolved issues of language identification (i.
e.
multilingual and very short texts language identification) without language barrier.
In order to attain this we adopt an approach of that uses all character ngram features of given text unit (i.
e.
word, phrase or etc).
Moreover, the proposed approach has a capability of identify the language of a text at any text unit (i.
e.
word, phrase, sentence or document) in both monolingual and multilingual setting.
The reason behind this capability of proposed approach is due to adopting word level features, in which every words need to be classify with regard to its language category.
The infinity ngram approach uses all character ngrams of text unit together in order to label the language category of given text per word level.
In order to observe the effectiveness of the proposed approach four experimental techniques are conducted: pure infinity character ngram, infinity ngram with location feature and infinity ngram with sentence and document level reformulation.
The experimental result indicates that an infinity ngram with location feature and along with sentence and document level reformulation achieves a promising result, which is an average F-measure of 100% at word, phrase, sentence, document level in monolingual setting.
As well, for multilingual setting also attains an average F-measure of 100% for both sentence and document level, but for phrase level achieves 84.
33%, 88.
95% and 90.
19% For Amharic, Geeze and Tigrigna respectively.
Beside this, at word level achieves 83.
16%, 80.
96% and 85.
85% for Amharic, Geeze, and Tigrigna respectively.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
The importance of mathematics in the modern society is overwhelming. The importance of mathematics has long been recognized all over the world, and that is why all students are req...
Moving towards (new) multilingual paradigms
Moving towards (new) multilingual paradigms
Abstract Multilingual education is increasingly perceived as a desirable goal in a world where global networks play a significant role. Crucially, educating multilin...
Identity, Multilingualism and CALL: Responding to New Global Realities
Identity, Multilingualism and CALL: Responding to New Global Realities
This volume focuses on a range of topics and studies that address the notion of plurilingualism and multilingual identity in computer-mediated language learning (CALL) spaces. Inte...
Multilingual communication in Speech Language Therapy
Multilingual communication in Speech Language Therapy
Multilingualism can both enrich and complicate interactions in health care. Several studies (Ferguson, 2002; Jacobs, 2017) point out that language differences between care provider...
On Living Mirrors and Mites: Leibniz’s Encounter with Pascal on Infinity and Living Things Circa 1696
On Living Mirrors and Mites: Leibniz’s Encounter with Pascal on Infinity and Living Things Circa 1696
This chapter examines Leibniz’s comment on fragment 22 of Pascal’s Pensées in the Port-Royal Edition (currently Lafuma 199). Leibniz responds to Pascal’s employment of the infinite...
Machine Learning Techniques for Effective Multilingual Text Classification
Machine Learning Techniques for Effective Multilingual Text Classification
The rapid growth of the internet and digital communication has led to an unprecedented increase in textual content across numerous languages worldwide. As a result, the field of Na...
Reducing Hallucination in Multilingual Voice Agents Using Instruction-Tuned Models
Reducing Hallucination in Multilingual Voice Agents Using Instruction-Tuned Models
In highly applied multilingual voice agents of customer service and interactive AI systems in the world, one persistent problem constantly haunts the industry/field: hallucinations...

Back to Top