Javascript must be enabled to continue!
Detection of Language from Roman Urdu and English Multilingual Corpus
View through CrossRef
PURPOSE: This study aims to suggest and validate a model to identify the languages from Roman Urdu and English mixed multilingual corpus collected from social media sites.
BACKGROUND: The problem of identifying languages from a corpus of written texts that includes two or more languages is known as language identification or detection. Identifying or detecting the language present in social media text is a requirement and it has numerous applications in natural language processing and computational linguistics, like for word embedding generation, emotion analysis and part of speech tagging etc.
METHODOLOGY: The dictionary-based baseline with SVM and Bi-Directional LSTM has been used in language identification from collected Roman Urdu and English multilingual Corpus. This research work will help in identify the languages from Roman Urdu and English Corpus. The English and Roman Urdu corpus had been obtained from different social media websites and cross-media platforms such as Facebook, Twitter, Google+, Instagram, WhatsApp, and Messenger, etc. The dictionary-based baseline with SVM and Bi-Directional LSTM has been used in language identification from collected Roman Urdu and English multilingual Corpus.
RESULTS: Based on the results achieved using the methodology in the research work the Bi-directional LSTM model performed better with an accuracy of 97.98%.
CONCLUSION: The problem in recognizing or detecting the language present in a given document or statement is referred to as language recognition or detection The Corpus of English and Roman Urdu is collected from social media websites. The text for training is submitted to a bi-direction LSTM accordingly to verify if the text is in English language or Urdu language. The results of word recognition for bidirectional word-level LSTM from Roman Urdu and English showed improved results.
Title: Detection of Language from Roman Urdu and English Multilingual Corpus
Description:
PURPOSE: This study aims to suggest and validate a model to identify the languages from Roman Urdu and English mixed multilingual corpus collected from social media sites.
BACKGROUND: The problem of identifying languages from a corpus of written texts that includes two or more languages is known as language identification or detection.
Identifying or detecting the language present in social media text is a requirement and it has numerous applications in natural language processing and computational linguistics, like for word embedding generation, emotion analysis and part of speech tagging etc.
METHODOLOGY: The dictionary-based baseline with SVM and Bi-Directional LSTM has been used in language identification from collected Roman Urdu and English multilingual Corpus.
This research work will help in identify the languages from Roman Urdu and English Corpus.
The English and Roman Urdu corpus had been obtained from different social media websites and cross-media platforms such as Facebook, Twitter, Google+, Instagram, WhatsApp, and Messenger, etc.
The dictionary-based baseline with SVM and Bi-Directional LSTM has been used in language identification from collected Roman Urdu and English multilingual Corpus.
RESULTS: Based on the results achieved using the methodology in the research work the Bi-directional LSTM model performed better with an accuracy of 97.
98%.
CONCLUSION: The problem in recognizing or detecting the language present in a given document or statement is referred to as language recognition or detection The Corpus of English and Roman Urdu is collected from social media websites.
The text for training is submitted to a bi-direction LSTM accordingly to verify if the text is in English language or Urdu language.
The results of word recognition for bidirectional word-level LSTM from Roman Urdu and English showed improved results.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
Primary PCI: a reasonable treatment for STEMI care during the COVID-19 pandemic
Primary PCI: a reasonable treatment for STEMI care during the COVID-19 pandemic
Abstract
Funding Acknowledgements
Type of funding sources: None.
Introduction
...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
DIGITAL ORTHOGRAPHY AND LINGUISTICS IDENTITY: THE SOCIOLINGUISTIC IMPLICATIONS OF ERRONEOUS URDU CAPTIONS IN DIGITAL MEDIA
DIGITAL ORTHOGRAPHY AND LINGUISTICS IDENTITY: THE SOCIOLINGUISTIC IMPLICATIONS OF ERRONEOUS URDU CAPTIONS IN DIGITAL MEDIA
Social media platforms have played a significant role in which Urdu is being recognized more frequently through different means especially through captions and subtitles that help ...

