Javascript must be enabled to continue!

Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach

Plagiarism means taking another person’s work and not giving any credit to them for it. Plagiarism is one of the most serious problems in academia and among researchers. Even though there are multiple tools available to detect plagiarism in a document but most of them are domain-specific and designed to work in English texts, but plagiarism is not limited to a single language only. Bengali is the most widely spoken language of Bangladesh and the second most spoken language in India with 300 million native speakers and 37 million second-language speakers. Plagiarism detection requires a large corpus for comparison. Bengali Literature has a history of 1300 years. Hence most Bengali Literature books are not yet digitalized properly. As there was no such corpus present for our purpose so we have collected Bengali Literature books from the National Digital Library of India and with a comprehensive methodology extracted texts from it and constructed our corpus. Our experimental results find out average accuracy between 72.10 % - 79.89 % in text extraction using OCR. Levenshtein Distance algorithm is used for determining Plagiarism. We have built a web application for end-user and successfully tested it for Plagiarism detection in Bengali texts. In future, we aim to construct a corpus with more books for more accurate detection.

MDPI AG

Satyajit Ghosh Aniruddha Ghosh Bittaswer Ghosh Abhishek Roy

2022

Title: Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach

Description:

Plagiarism means taking another person’s work and not giving any credit to them for it.

Plagiarism is one of the most serious problems in academia and among researchers.

Even though there are multiple tools available to detect plagiarism in a document but most of them are domain-specific and designed to work in English texts, but plagiarism is not limited to a single language only.

Bengali is the most widely spoken language of Bangladesh and the second most spoken language in India with 300 million native speakers and 37 million second-language speakers.

Plagiarism detection requires a large corpus for comparison.

Bengali Literature has a history of 1300 years.

Hence most Bengali Literature books are not yet digitalized properly.

As there was no such corpus present for our purpose so we have collected Bengali Literature books from the National Digital Library of India and with a comprehensive methodology extracted texts from it and constructed our corpus.

Our experimental results find out average accuracy between 72.

10 % - 79.

89 % in text extraction using OCR.

Levenshtein Distance algorithm is used for determining Plagiarism.

We have built a web application for end-user and successfully tested it for Plagiarism detection in Bengali texts.

In future, we aim to construct a corpus with more books for more accurate detection.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

PERAN PUSTAKAWAN MENYUDAHI PLAGIARISME

AbstrakPlagiarisme secara “tak sengaja” bisa terjadi hanya karena “lupa” menyitat. Isi sebuah paragraf bisa tampil sama sekali berbeda dalam hal penggunaan kata tetapi masih berma...

AI Open research Plagiarism Dupli Checker, Scribbr Plagiarism Checker, Quetext, Small SEO Tools Plagiarism Checker Web Technology: comparative study

Purpose This paper mainly aims to explore the AI Open research Plagiarism Dupli Checker, Scribbr Plagiarism Checker, Quetext and Small SEO Tools Plagiarism Checker and provides a c...

Review of Source Code Plagiarism Detection Techniques

In the educational sector, where scientific publications and articles are concerned, plagiarism detection systems are critical. Plagiarism occurs when someone copies a piece of con...

Development of an Effective Hybrid Text Plagiarism Detection System using Machine Learning Techniques

In recent times, there has been a great spread of plagiarism as a result to the advancement on internet technology, which has brought about large volume of information to be share ...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Thu Dau Mot University students’ perceptions of plagiarism

Plagiarism is a very common problem in many universities. A lot of students plagiarize unconsciously because they don't understand the concept. The study will clarify the concept o...

Plagiarisme dalam Dunia Pendidikan: Analisis Masalah Sosial dan Urgensi Pendidikan Karakter

Plagiarism is taking someone else's work and using it in academic writing as if it were his/her own. Plagiarism is a form of moral offense and people who commit plagiarism will get...

Email:
Password:

Email:

Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach

Related Results