Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

QALB: Qatar Arabic language bank

View through CrossRef
Automatic text correction has been attracting research attention for English and some other western languages. Applications for automatic text correction vary from improving language learning for humans and reducing noise in text input to natural language processing tools to correcting machine translation output for grammatical and lexical choice errors. Despite the recent focus on some Arabic language technologies, Arabic automatic correction is still a fairly understudied research problem. Modern Standard Arabic (MSA) is a morphologically and syntactically complex language, which poses multiple writing challenges not only to language learners, but also to Arabic speakers, whose dialects differ substantially from MSA. We are currently creating resources to address these challenges. Our project has two components: first is QALB (Qatar Arabic Language Bank), a large parallel corpus of Arabic sentences and their corrections, and second is ACLE (Automatic Correction of Language Errors), an Arabic text correction system trained and tested on the QALB corpus. The QALB corpus is unique in that: a) it will be the largest Arabic text correction corpus available, spanning two million words; b) it will cover errors produced by native-speakers, non-native speakers, and machine translation systems; and c) it will contain a trace of all the actions performed by the human annotators to achieve the final correction. This presentation describes the creation of two major components of the project: the web-based annotation interface and the annotation guidelines. QAWI (QALB Annotation Web Interface) is our web-based, language-independent annotation framework used for manual correction of the QALB corpus. Our framework provides intuitive interfaces for annotating text, managing a large number of human annotators and performing quality control. Our annotation interface, in particular, provides a novel token-based editing model for correcting Arabic text that allows us to reliably track all modifications. We demonstrate details of both the annotation and the administration interfaces as well as the back-end engine. Furthermore, we show how this framework is able to speed up the annotation process by employing automated annotators to correct basic Arabic spelling errors. We also discuss the evolution of our annotation guidelines from its early developments through its actual usage for group annotation. The guidelines cover a variety of linguistic phenomena, from spelling errors to dialectal variations and grammatical considerations. The guidelines also include a large number of examples to help annotators understand the general principles behind the correction rules and not simply memorize them. The guidelines were written in parallel to the development of our web-based annotation interface and involved several iterations and revisions. We periodically provided new training sessions to the annotators and measured their inter-annotator agreement. Furthermore, the guidelines were updated and extended using feedback from the annotators and the inter-annotator agreement evaluations. This project is supported by the National Priority Research Program (NPRP grant 4-1058-1-168) of the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Hamad bin Khalifa University Press (HBKU Press)
Title: QALB: Qatar Arabic language bank
Description:
Automatic text correction has been attracting research attention for English and some other western languages.
Applications for automatic text correction vary from improving language learning for humans and reducing noise in text input to natural language processing tools to correcting machine translation output for grammatical and lexical choice errors.
Despite the recent focus on some Arabic language technologies, Arabic automatic correction is still a fairly understudied research problem.
Modern Standard Arabic (MSA) is a morphologically and syntactically complex language, which poses multiple writing challenges not only to language learners, but also to Arabic speakers, whose dialects differ substantially from MSA.
We are currently creating resources to address these challenges.
Our project has two components: first is QALB (Qatar Arabic Language Bank), a large parallel corpus of Arabic sentences and their corrections, and second is ACLE (Automatic Correction of Language Errors), an Arabic text correction system trained and tested on the QALB corpus.
The QALB corpus is unique in that: a) it will be the largest Arabic text correction corpus available, spanning two million words; b) it will cover errors produced by native-speakers, non-native speakers, and machine translation systems; and c) it will contain a trace of all the actions performed by the human annotators to achieve the final correction.
This presentation describes the creation of two major components of the project: the web-based annotation interface and the annotation guidelines.
QAWI (QALB Annotation Web Interface) is our web-based, language-independent annotation framework used for manual correction of the QALB corpus.
Our framework provides intuitive interfaces for annotating text, managing a large number of human annotators and performing quality control.
Our annotation interface, in particular, provides a novel token-based editing model for correcting Arabic text that allows us to reliably track all modifications.
We demonstrate details of both the annotation and the administration interfaces as well as the back-end engine.
Furthermore, we show how this framework is able to speed up the annotation process by employing automated annotators to correct basic Arabic spelling errors.
We also discuss the evolution of our annotation guidelines from its early developments through its actual usage for group annotation.
The guidelines cover a variety of linguistic phenomena, from spelling errors to dialectal variations and grammatical considerations.
The guidelines also include a large number of examples to help annotators understand the general principles behind the correction rules and not simply memorize them.
The guidelines were written in parallel to the development of our web-based annotation interface and involved several iterations and revisions.
We periodically provided new training sessions to the annotators and measured their inter-annotator agreement.
Furthermore, the guidelines were updated and extended using feedback from the annotators and the inter-annotator agreement evaluations.
This project is supported by the National Priority Research Program (NPRP grant 4-1058-1-168) of the Qatar National Research Fund (a member of the Qatar Foundation).
The statements made herein are solely the responsibility of the authors.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Personalizing the Museum Experience in Qatar
Personalizing the Museum Experience in Qatar
IntroductionMuseum Personalization was identified as one of the six most important emerging trends for museums in 2015 by the Center for the Future of Museums.[1] It is an approach...
Globalization and Socio-Cultural Change in Qatar
Globalization and Socio-Cultural Change in Qatar
Globalization is impacting many aspects of life in Qatar and Qatari nationals must increasingly cope with forces generated by economic, cultural, political, and social changes in t...
Diabetes Awareness Among High School Students in Qatar
Diabetes Awareness Among High School Students in Qatar
Diabetes is a disease that occurs when there is an abundance of glucose in the blood stream and the body cannot produce enough insulin in the pancreas to transfer the sugar from th...
Pemaknaan Qalb Salīm dengan Metode Analisis Semantik Toshihiko Izutsu
Pemaknaan Qalb Salīm dengan Metode Analisis Semantik Toshihiko Izutsu
The word Qalb Salīm contained in the Qur’an, often interpreted with the meaning of “a clean heart” by the commentators of the Qur’an. In fact, to express the meaning of a clean hea...
MA’RIFATUN NAFS SEBAGAI EPISTEMOLOGI PENDIDIKAN AKHLAK
MA’RIFATUN NAFS SEBAGAI EPISTEMOLOGI PENDIDIKAN AKHLAK
Pendidikan akhlak merupakan inti dari pendidikan agama Islam, dengan misi Nabi Muhammad SAW untuk menyempurnakan akhlak. Di era globalisasi, kompleksitas problem akhlak menuntut pe...
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Language is an important tool for the life of civilized man. Through language, people can communicate with each other, and convey their intentions and feelings to others. The moder...
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
Many Languages are spoken in the world. The diversity of human languages and colors are sign of Allah, for those of knowledge (Al-Quran, 30:22). Although the Arabic language origin...

Back to Top