Javascript must be enabled to continue!
Understanding the Research Challenges in Low-Resource Language and Linking Bilingual News Articles in Multilingual News Archive
View through CrossRef
The developed world has focused on Web preservation compared to the developing world, especially news preservation for future generations. However, the news published online is volatile because of constant changes in the technologies used to disseminate information and the formats used for publication. News preservation became more complicated and challenging when the archive began to contain articles from low-resourced and morphologically complex languages like Urdu and Arabic, along with English news articles. The digital news story preservation framework is enriched with eighteen sources for Urdu, Arabic, and English news sources. This study presents challenges in low-resource languages (LRLs), research challenges, and details of how the framework is enhanced. In this paper, we introduce a multilingual news archive and discuss the digital news story extractor, which addresses major issues in implementing low-resource languages and facilitates normalized format migration. The extraction results are presented in detail for high-resource languages, i.e., English, and low-resource languages, i.e., Urdu and Arabic. LRLs encountered a high error rate during preservation compared to high-resource languages (HRLs), corresponding to 10% and 03%, respectively. The extraction results show that few news sources are not regularly updated and release few new news stories online. LRLs require more detailed study for accurate news content extraction and archiving for future access. LRLs and HRLs enrich the digital news story preservation (DNSP) framework. The Digital News Stories Archive (DNSA) preserves a huge number of news articles from multiple news sources in LRLs and HRLs. This paper presents research challenges encountered during the preservation of Urdu and Arabic-language news articles to create a multilingual news archive. The second part of the paper compares two bilingual linking mechanisms for Urdu-to-English-language news articles in the DNSA: the common ratio measure for dual language (CRMDL) and the similarity measure based on transliteration words (SMTW) with the cosine similarity measure (CSM) baseline technique. The experimental results show that the SMTW is more effective than the CRMDL and CSM for linking Urdu-to-English news articles. The precision improved from 46% and 50% to 60%, and the recall improved from 64% and 67% to 82% for CSM, CRMDL, and SMTW, respectively, with improved impact of common terms as well.
Title: Understanding the Research Challenges in Low-Resource Language and Linking Bilingual News Articles in Multilingual News Archive
Description:
The developed world has focused on Web preservation compared to the developing world, especially news preservation for future generations.
However, the news published online is volatile because of constant changes in the technologies used to disseminate information and the formats used for publication.
News preservation became more complicated and challenging when the archive began to contain articles from low-resourced and morphologically complex languages like Urdu and Arabic, along with English news articles.
The digital news story preservation framework is enriched with eighteen sources for Urdu, Arabic, and English news sources.
This study presents challenges in low-resource languages (LRLs), research challenges, and details of how the framework is enhanced.
In this paper, we introduce a multilingual news archive and discuss the digital news story extractor, which addresses major issues in implementing low-resource languages and facilitates normalized format migration.
The extraction results are presented in detail for high-resource languages, i.
e.
, English, and low-resource languages, i.
e.
, Urdu and Arabic.
LRLs encountered a high error rate during preservation compared to high-resource languages (HRLs), corresponding to 10% and 03%, respectively.
The extraction results show that few news sources are not regularly updated and release few new news stories online.
LRLs require more detailed study for accurate news content extraction and archiving for future access.
LRLs and HRLs enrich the digital news story preservation (DNSP) framework.
The Digital News Stories Archive (DNSA) preserves a huge number of news articles from multiple news sources in LRLs and HRLs.
This paper presents research challenges encountered during the preservation of Urdu and Arabic-language news articles to create a multilingual news archive.
The second part of the paper compares two bilingual linking mechanisms for Urdu-to-English-language news articles in the DNSA: the common ratio measure for dual language (CRMDL) and the similarity measure based on transliteration words (SMTW) with the cosine similarity measure (CSM) baseline technique.
The experimental results show that the SMTW is more effective than the CRMDL and CSM for linking Urdu-to-English news articles.
The precision improved from 46% and 50% to 60%, and the recall improved from 64% and 67% to 82% for CSM, CRMDL, and SMTW, respectively, with improved impact of common terms as well.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS
The importance of mathematics in the modern society is overwhelming. The importance of mathematics has long been recognized all over the world, and that is why all students are req...
Growing up bilingual: examining the language input and word segmentation abilities of bilingual infants
Growing up bilingual: examining the language input and word segmentation abilities of bilingual infants
Infants’ early language experiences play a critical role on their language development. In this dissertation, I explored the nature of this relationship in a bilingual context. Spe...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
Language Alternation in Multilingual Societies: Analyzing Bi/Multilingual Conversation
Language Alternation in Multilingual Societies: Analyzing Bi/Multilingual Conversation
The research examines the relationship between language choice and alternation in bilingual/multilingual conversations within a multicultural/multilingual context. It builds on the...
Early, late or very late?
Early, late or very late?
Research on child bilingualism accounts for differences in the course and the outcomes of monolingual and different types of bilingual language acquisition primarily from two persp...

