Javascript must be enabled to continue!
Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach
View through CrossRef
Plagiarism means taking another person’s work and not giving any credit to them for it. Plagiarism is one of the most serious problems in academia and among researchers. Even though there are multiple tools available to detect plagiarism in a document but most of them are domain-specific and designed to work in English texts, but plagiarism is not limited to a single language only. Bengali is the most widely spoken language of Bangladesh and the second most spoken language in India with 300 million native speakers and 37 million second-language speakers. Plagiarism detection requires a large corpus for comparison. Bengali Literature has a history of 1300 years. Hence most Bengali Literature books are not yet digitalized properly. As there was no such corpus present for our purpose so we have collected Bengali Literature books from the National Digital Library of India and with a comprehensive methodology extracted texts from it and constructed our corpus. Our experimental results find out average accuracy between 72.10 % - 79.89 % in text extraction using OCR. Levenshtein Distance algorithm is used for determining Plagiarism. We have built a web application for end-user and successfully tested it for Plagiarism detection in Bengali texts. In future, we aim to construct a corpus with more books for more accurate detection.
Title: Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach
Description:
Plagiarism means taking another person’s work and not giving any credit to them for it.
Plagiarism is one of the most serious problems in academia and among researchers.
Even though there are multiple tools available to detect plagiarism in a document but most of them are domain-specific and designed to work in English texts, but plagiarism is not limited to a single language only.
Bengali is the most widely spoken language of Bangladesh and the second most spoken language in India with 300 million native speakers and 37 million second-language speakers.
Plagiarism detection requires a large corpus for comparison.
Bengali Literature has a history of 1300 years.
Hence most Bengali Literature books are not yet digitalized properly.
As there was no such corpus present for our purpose so we have collected Bengali Literature books from the National Digital Library of India and with a comprehensive methodology extracted texts from it and constructed our corpus.
Our experimental results find out average accuracy between 72.
10 % - 79.
89 % in text extraction using OCR.
Levenshtein Distance algorithm is used for determining Plagiarism.
We have built a web application for end-user and successfully tested it for Plagiarism detection in Bengali texts.
In future, we aim to construct a corpus with more books for more accurate detection.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
PERAN PUSTAKAWAN MENYUDAHI PLAGIARISME
PERAN PUSTAKAWAN MENYUDAHI PLAGIARISME
AbstrakPlagiarisme secara “tak sengaja” bisa terjadi hanya karena “lupa” menyitat. Isi sebuah paragraf bisa tampil sama sekali berbeda dalam hal penggunaan kata tetapi masih berma...
AI Open research Plagiarism Dupli Checker, Scribbr Plagiarism Checker, Quetext, Small SEO Tools Plagiarism Checker Web Technology: comparative study
AI Open research Plagiarism Dupli Checker, Scribbr Plagiarism Checker, Quetext, Small SEO Tools Plagiarism Checker Web Technology: comparative study
Purpose
This paper mainly aims to explore the AI Open research Plagiarism Dupli Checker, Scribbr Plagiarism Checker, Quetext and Small SEO Tools Plagiarism Checker and provides a c...
Review of Source Code Plagiarism Detection Techniques
Review of Source Code Plagiarism Detection Techniques
In the educational sector, where scientific publications and articles are concerned, plagiarism detection systems are critical. Plagiarism occurs when someone copies a piece of con...
Development of an Effective Hybrid Text Plagiarism Detection System using Machine Learning Techniques
Development of an Effective Hybrid Text Plagiarism Detection System using Machine Learning Techniques
In recent times, there has been a great spread of plagiarism as a result to the advancement on internet technology, which has brought about large volume of information to be share ...
Thu Dau Mot University students’ perceptions of plagiarism
Thu Dau Mot University students’ perceptions of plagiarism
Plagiarism is a very common problem in many universities. A lot of students plagiarize unconsciously because they don't understand the concept. The study will clarify the concept o...
Plagiarisme dalam Dunia Pendidikan: Analisis Masalah Sosial dan Urgensi Pendidikan Karakter
Plagiarisme dalam Dunia Pendidikan: Analisis Masalah Sosial dan Urgensi Pendidikan Karakter
Plagiarism is taking someone else's work and using it in academic writing as if it were his/her own. Plagiarism is a form of moral offense and people who commit plagiarism will get...
PLAGIARISM DETECTION FOR PROJECT REPORT USING MACHINE LEARNING
PLAGIARISM DETECTION FOR PROJECT REPORT USING MACHINE LEARNING
Plagiarism is an unethical act of using someone else's work or ideas without giving them credit, which is a growing problem in various fields. However, the current systems for plag...

