Javascript must be enabled to continue!
Author identification for Under-Resourced language (KadazanDusun)
View through CrossRef
<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.</span>
Institute of Advanced Engineering and Science
Title: Author identification for Under-Resourced language (KadazanDusun)
Description:
<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia.
The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun.
Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features.
Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams.
The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun.
Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM).
The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.
17%.
The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.
17%.
</span>.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Conceptualizing the Language and Cultural Ideologies of the Kadazandusun in Sabah, Malaysian Borneo
Conceptualizing the Language and Cultural Ideologies of the Kadazandusun in Sabah, Malaysian Borneo
The Kadazandusun ethnic grouping in Sabah, Malaysia, a political network of at least 40 cultural groups with Bornean roots and animistic traditions, continues to grapple with ident...
Kearifan Tempatan dalam Sistem Bombon oleh Masyarakat Kadazandusun di Kiulu, Tuara, Sabah
Kearifan Tempatan dalam Sistem Bombon oleh Masyarakat Kadazandusun di Kiulu, Tuara, Sabah
Tulisan ini membincangkan sistem Bombon sebagai salah satu kearifan tempatan masyarakat Kadazandusun yang dilaksanakan sejak zaman nenek moyang mereka. Sistem ...
TUGU NUNUK RAGANG, RANAU: SIMBOL BUDAYA DAN KEPERCAYAAN MASYARAKAT DUSUN
TUGU NUNUK RAGANG, RANAU: SIMBOL BUDAYA DAN KEPERCAYAAN MASYARAKAT DUSUN
Nunuk Ragang merupakan salah satu tugu penting bagi masyarakat Kadazandusun. Tugu ini menjadi simbolik dan warisan kebudayaan yang penting dalam menjelaskan asal usul masyarakat Ka...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
UNSUR FORMALISTIK DALAM KARYA YA BAMBAN YA LUKAH DAN HUMINODUN SPIRIT
UNSUR FORMALISTIK DALAM KARYA YA BAMBAN YA LUKAH DAN HUMINODUN SPIRIT
Karya catan Hassan Majin iaitu Ya Bamban Ya Lukah dan Huminodun Spirits (Kadazandusun) oleh Abdullah Ehlid Al Walid adalah cetusan idea pelukis terhadap tarian tradisi masyarakat B...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Double Exposure
Double Exposure
I. Happy Endings
Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...

