Javascript must be enabled to continue!

Language Independent and Multilingual Language Identification using Infinity Ngram Approach

Now days it is possible to get massive amount of multilingual digital information that are generated, propagated, exchanged, stored and accessed through the web each day across the world. Such accumulation of multilingual digital data becomes an obstacle for information acquisition. In order to tackling such difficulty language identification is the first step among many steps that are used for information acquisition. Language identification is the process of labeling given text content into corresponding language category. In past decades research works have been done in the area of language identification. However, there are issues which are not solved until: multilingual language identification, discriminating language category of very closely related languages documents and labelling the language category for very short texts like words or phrases. In this investigation, we propose an approach which able to eradicate unsolved issues of language identification (i.e. multilingual and very short texts language identification) without language barrier. In order to attain this we adopt an approach of that uses all character ngram features of given text unit (i.e. word, phrase or etc). Moreover, the proposed approach has a capability of identify the language of a text at any text unit (i.e. word, phrase, sentence or document) in both monolingual and multilingual setting. The reason behind this capability of proposed approach is due to adopting word level features, in which every words need to be classify with regard to its language category. The infinity ngram approach uses all character ngrams of text unit together in order to label the language category of given text per word level. In order to observe the effectiveness of the proposed approach four experimental techniques are conducted: pure infinity character ngram, infinity ngram with location feature and infinity ngram with sentence and document level reformulation. The experimental result indicates that an infinity ngram with location feature and along with sentence and document level reformulation achieves a promising result, which is an average F-measure of 100% at word, phrase, sentence, document level in monolingual setting. As well, for multilingual setting also attains an average F-measure of 100% for both sentence and document level, but for phrase level achieves 84.33%, 88.95% and 90.19% For Amharic, Geeze and Tigrigna respectively. Beside this, at word level achieves 83.16%, 80.96% and 85.85% for Amharic, Geeze, and Tigrigna respectively.

Technoscience Academy

Kidst Ergetie Andargie Tsegay Mullu Kassa

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

2019

Title: Language Independent and Multilingual Language Identification using Infinity Ngram Approach

Description:

Now days it is possible to get massive amount of multilingual digital information that are generated, propagated, exchanged, stored and accessed through the web each day across the world.

Such accumulation of multilingual digital data becomes an obstacle for information acquisition.

In order to tackling such difficulty language identification is the first step among many steps that are used for information acquisition.

Language identification is the process of labeling given text content into corresponding language category.

In past decades research works have been done in the area of language identification.

However, there are issues which are not solved until: multilingual language identification, discriminating language category of very closely related languages documents and labelling the language category for very short texts like words or phrases.

In this investigation, we propose an approach which able to eradicate unsolved issues of language identification (i.

multilingual and very short texts language identification) without language barrier.

In order to attain this we adopt an approach of that uses all character ngram features of given text unit (i.

word, phrase or etc).

Moreover, the proposed approach has a capability of identify the language of a text at any text unit (i.

word, phrase, sentence or document) in both monolingual and multilingual setting.

The reason behind this capability of proposed approach is due to adopting word level features, in which every words need to be classify with regard to its language category.

The infinity ngram approach uses all character ngrams of text unit together in order to label the language category of given text per word level.

In order to observe the effectiveness of the proposed approach four experimental techniques are conducted: pure infinity character ngram, infinity ngram with location feature and infinity ngram with sentence and document level reformulation.

The experimental result indicates that an infinity ngram with location feature and along with sentence and document level reformulation achieves a promising result, which is an average F-measure of 100% at word, phrase, sentence, document level in monolingual setting.

As well, for multilingual setting also attains an average F-measure of 100% for both sentence and document level, but for phrase level achieves 84.

33%, 88.

95% and 90.

19% For Amharic, Geeze and Tigrigna respectively.

Beside this, at word level achieves 83.

16%, 80.

96% and 85.

85% for Amharic, Geeze, and Tigrigna respectively.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Computational issues in the design of robust nonlinear controllers

Just like the algebraic Riccati equations (AREs) or inequalities (ARIs) in the linear H[infinity] control theory, the Hamilton-Jacobi equations (HJEs) or inequalities (HJIs) play a...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Language Alternation in Multilingual Societies: Analyzing Bi/Multilingual Conversation

The research examines the relationship between language choice and alternation in bilingual/multilingual conversations within a multicultural/multilingual context. It builds on the...

EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS

The importance of mathematics in the modern society is overwhelming. The importance of mathematics has long been recognized all over the world, and that is why all students are req...

Metacognition in multilingual learning and teaching

Abstract Metacognition has been increasingly discussed as one of the main features of learning in the 21st century (see Haukås, Bjørke, & Dypedahl, 2018). In the Dynamic Model ...

Nonlinear optimal control for robotic exoskeletons with electropneumatic actuators

Purpose To provide high torques needed to move a robot’s links, electric actuators are followed by a transmission system with a high transmission rate. For instance, gear ratios of...

AI and Second Language Acquisition in Multilingual Scenarios

Introduction: This article focusses on the application of artificial intelligence in language learning, particularly in multilingual communities. It discusses how interest is gener...

Email:
Password:

Email:

Language Independent and Multilingual Language Identification using Infinity Ngram Approach

Related Results