Javascript must be enabled to continue!

Annotated Lexicon for Sentiment Analysis in the Bosnian Language

The paper presents the first sentiment-annotated lexicon of the Bosnian language. The annotation process and methodology are presented along with a usability study, which concentrates on language coverage. The composition of the starting base was done by translating the Slovenian annotated lexicon and later manually checking the translations and annotations. The language coverage was observed using two reference corpora. The Bosnian language is still considered a low-resource language. A reference corpus comprised of automatically crawled web pages is available for the Bosnian language, but the authors had a hard time sourcing any corpora with a clear time frame for the text contained therein. A corpus of contemporary texts was constructed by collecting news articles from several Bosnian web portals. Two language coverage methods were used in this experiment. The first used a frequency list of all words extracted from two reference Bosnian language corpora, and the second ignored the frequencies as the main factor in counting. The computed coverage using the first presented method for the first corpus was 19.24%, while the second corpus yielded 28.05%. The second method yielded 2.34% coverage for the first corpus and 6.98% for the second corpus. The results of the study present a language coverage that is comparable to the state of the art in the field. The usability of the lexicon was already proven in a Twitter-based comparison.

University of Ljubljana

Sead Jahić Jernej Vičič

Slovenščina 2.0: empirične, aplikativne in interdisciplinarne raziskave

2024

Title: Annotated Lexicon for Sentiment Analysis in the Bosnian Language

Description:

The paper presents the first sentiment-annotated lexicon of the Bosnian language.

The annotation process and methodology are presented along with a usability study, which concentrates on language coverage.

The composition of the starting base was done by translating the Slovenian annotated lexicon and later manually checking the translations and annotations.

The language coverage was observed using two reference corpora.

The Bosnian language is still considered a low-resource language.

A reference corpus comprised of automatically crawled web pages is available for the Bosnian language, but the authors had a hard time sourcing any corpora with a clear time frame for the text contained therein.

A corpus of contemporary texts was constructed by collecting news articles from several Bosnian web portals.

Two language coverage methods were used in this experiment.

The first used a frequency list of all words extracted from two reference Bosnian language corpora, and the second ignored the frequencies as the main factor in counting.

The computed coverage using the first presented method for the first corpus was 19.

24%, while the second corpus yielded 28.

05%.

The second method yielded 2.

34% coverage for the first corpus and 6.

98% for the second corpus.

The results of the study present a language coverage that is comparable to the state of the art in the field.

The usability of the lexicon was already proven in a Twitter-based comparison.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Ekonomika bosanskih velikaša u 14. i 15. stoljeću

The role and significance of the Bosnian nobility in the historical currents of medieval Bosnia can be reliably traced in the 14th and 15th centuries when various socio-political f...

William Colenso’s Māori-English Lexicon

<p>William Colenso, one of Victorian New Zealand’s most accomplished polymaths, is remembered best as a printer, a defrocked missionary, botanist, and politician. Up till now...

Sentiment Analysis with Python: A Hands-on Approach

Sentiment Analysis is a rapidly growing field in Natural Language Processing (NLP) that aims to extract opinions, emotions, and attitudes expressed in text. It has a wide range o...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

REFLECTING THE ATTITUDES ABOUT THE SCHOLARLY CONTRIBUTION OF ACADEMICIAN VOJISLAV P. NIKČEVIĆ

The modern meaning of linguistic and literal science in Montenegro comes from the pioneer’s works of academic Vojislav P. Nikcevic, who made in period from 1965. to 2007., not only...

Lexicon-based sentiment analysis for stock movement prediction

Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon...

WITHDRAWN: ChatGptTweets Analyses Based On AI

Abstract Sentiment analysis plays a crucial role in understanding public opinions and attitudes. In this study, we address the sentiment analysis of ChatGPT tweets, leverag...

Email:
Password:

Email:

Annotated Lexicon for Sentiment Analysis in the Bosnian Language

Related Results