Javascript must be enabled to continue!
Bilingual Code-Switching Using XLM-RoBERTa
View through CrossRef
This study discusses the linguistic complexities of code-switching Tagalog-English (Taglish), for natural language processing (NLP) applications with text classification using the XLM-RoBERTa Large model. Code-switching presents challenges due to its informal syntax, frequent language alternation, and linguistic ambiguities. Using a custom SentencePiece tokenizer and a preprocessed dataset of ambiguity-tagged Taglish reviews, the model was evaluated on the aforementioned key performance metrics: precision, F1-score, and ROC-AUC. Testing results showed that XLM-RoBERTa Large was the best-performing model in terms of testing precision at 97.62%, F1score at 95.77%, and ROC-AUC at 97.50%. Such measurements indicated that the model possessed a significant ability to face linguistic difficulties such as syntactic ambiguities, mixed-language constructs, and informal expressions. A comparative analysis was undertaken by testing XLM-RoBERTa Large against its Base variant and the multilingual BERT model (mBERT) in handling bilingual text classification tasks. The key innovations of the study were advanced preprocessing techniques including language tagging, ambiguity tagging and customized tokenization. These enhanced the model's ability to handle informal and code-switched text. The methodologies provided the model with the ability to classify constructs with mixedlanguage elements and informal expressions effectively. Thus, the study concludes that XLM-RoBERTa Large is a very powerful tool in addressing linguistic ambiguities in under-resourced and multilingual settings. Future research involves larger and even more diverse data sets, state-of-the art optimization strategies, and the further exploration of integration of contextual embeddings to further make the model stronger and scalable. This research serves as a valuable contribution to NLP tools effectiveness in bilingual text and code switching, opening room for further strides in multilingual language processing Philippine and global scales.
Institute of Electrical and Electronics Engineers (IEEE)
Title: Bilingual Code-Switching Using XLM-RoBERTa
Description:
This study discusses the linguistic complexities of code-switching Tagalog-English (Taglish), for natural language processing (NLP) applications with text classification using the XLM-RoBERTa Large model.
Code-switching presents challenges due to its informal syntax, frequent language alternation, and linguistic ambiguities.
Using a custom SentencePiece tokenizer and a preprocessed dataset of ambiguity-tagged Taglish reviews, the model was evaluated on the aforementioned key performance metrics: precision, F1-score, and ROC-AUC.
Testing results showed that XLM-RoBERTa Large was the best-performing model in terms of testing precision at 97.
62%, F1score at 95.
77%, and ROC-AUC at 97.
50%.
Such measurements indicated that the model possessed a significant ability to face linguistic difficulties such as syntactic ambiguities, mixed-language constructs, and informal expressions.
A comparative analysis was undertaken by testing XLM-RoBERTa Large against its Base variant and the multilingual BERT model (mBERT) in handling bilingual text classification tasks.
The key innovations of the study were advanced preprocessing techniques including language tagging, ambiguity tagging and customized tokenization.
These enhanced the model's ability to handle informal and code-switched text.
The methodologies provided the model with the ability to classify constructs with mixedlanguage elements and informal expressions effectively.
Thus, the study concludes that XLM-RoBERTa Large is a very powerful tool in addressing linguistic ambiguities in under-resourced and multilingual settings.
Future research involves larger and even more diverse data sets, state-of-the art optimization strategies, and the further exploration of integration of contextual embeddings to further make the model stronger and scalable.
This research serves as a valuable contribution to NLP tools effectiveness in bilingual text and code switching, opening room for further strides in multilingual language processing Philippine and global scales.
Related Results
ALIH KODE DALAM DIALOG NOVEL SURGA YANG TAK DIRINDUKAN KARYA ASMA NADIA
ALIH KODE DALAM DIALOG NOVEL SURGA YANG TAK DIRINDUKAN KARYA ASMA NADIA
<p><em>The objectives of this research are to explain: (1) the forms of code switching in a dialogue of novel Surga yang Tak Dirindukan, (2) the factors influencing of ...
The Effectiveness of Using Code Switching in Teaching English on Higher Education
The Effectiveness of Using Code Switching in Teaching English on Higher Education
The occurence of using code switching in classroom appears for lecturer to explain topic, it also uses English-Indonesia, the students are difficult to understand when lecturer spe...
ANALISIS ALIH KODE DAN CAMPUR KODE PADA FILM “SANG PRAWIRA EPISODE I DAN EPISODE II” KARYA ONET ADITHIA RIZLAN
ANALISIS ALIH KODE DAN CAMPUR KODE PADA FILM “SANG PRAWIRA EPISODE I DAN EPISODE II” KARYA ONET ADITHIA RIZLAN
This study of code switching and code mixing analysis in the film "Sang Prawira Episode I and Episode II" by Onet Adithia Rizlan aims to determine code switching and code mixing se...
Alih Kode Dan Campur Kode Dalam Interaksi Masyarakat Terminal Motabuik Kota Atambua
Alih Kode Dan Campur Kode Dalam Interaksi Masyarakat Terminal Motabuik Kota Atambua
This research aims to describe the use of language in community interactions at the Motabuik terminal, Atambua City. The use of language in question is the form and function of cod...
Code Switching And Code Mixing In Film Habibie & Ainun 3
Code Switching And Code Mixing In Film Habibie & Ainun 3
Code switching is used when a speaker switches language to another language. It usually occurs in a multilingual society. Code switching can be divided into two, namely internal co...
CODE CHOICE USED BY CHIYU TAMADE (CHU2) CHARACTER IN THE ANIME “BANG DREAM! SEASON 2 EP 3, 8, AND 9”
CODE CHOICE USED BY CHIYU TAMADE (CHU2) CHARACTER IN THE ANIME “BANG DREAM! SEASON 2 EP 3, 8, AND 9”
In the field of Sociolinguistics, phenomenons of language use such as code-switching and code-mixing are often found in our daily lives. BanG Dream is a multimedia project that foc...
CODE-SWITCHING IN ENGLISH CLASSROOM
CODE-SWITCHING IN ENGLISH CLASSROOM
Code-switching is one of sociolinguitics phenomenon when a a speaker of bilingual or multilingual switch from a language to another one. The research aims to figure out types of te...
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...

