Javascript must be enabled to continue!

Similarity Detection in Large Volume Data using Machine Learning Techniques

When unauthorized copying or stealing of intellectual properties of others happen, it is called plagiarism. Two main approaches are used to counter this problem – external plagiarism detection and intrinsic plagiarism detection. External algorithms compare a suspicious file with numerous sources whereas intrinsic algorithms are allowed to solely inspect the suspicious file in order to predict plagiarism. In this work, the area chosen for detecting plagiarism is with programs or source code files. Copying the entire source code or logic used in a particular program without permissions or copyright is the stealing that happens in the case of source codes. There exist many ways to detect plagiarism in source code files. To perform plagiarism checking for a large dataset, the computational cost is very high and moreover it’s a time consuming job. To achieve a computationally efficient similarity detection in source code files, the Hadoop framework is used where parallel computation is possible for large datasets. But the raw data available to us is not in a suitable form for the existing plagiarism checking tools to work with, as their size is too high and they possess features of big data. Thus a qualifying model is required for the dataset, to be fed into Hadoop so that it could efficiently process them to check for plagiarism in source codes. To generate such a model, machine learning is used which incorporates big data with machine learning.

Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP

Viji Gopal* Dr. Varghese Paul Dr.M Sudheep Elayidom Dr.Sasi Gopalan

International Journal of Recent Technology and Engineering (IJRTE)

2019

Title: Similarity Detection in Large Volume Data using Machine Learning Techniques

Description:

When unauthorized copying or stealing of intellectual properties of others happen, it is called plagiarism.

Two main approaches are used to counter this problem – external plagiarism detection and intrinsic plagiarism detection.

External algorithms compare a suspicious file with numerous sources whereas intrinsic algorithms are allowed to solely inspect the suspicious file in order to predict plagiarism.

In this work, the area chosen for detecting plagiarism is with programs or source code files.

Copying the entire source code or logic used in a particular program without permissions or copyright is the stealing that happens in the case of source codes.

There exist many ways to detect plagiarism in source code files.

To perform plagiarism checking for a large dataset, the computational cost is very high and moreover it’s a time consuming job.

To achieve a computationally efficient similarity detection in source code files, the Hadoop framework is used where parallel computation is possible for large datasets.

But the raw data available to us is not in a suitable form for the existing plagiarism checking tools to work with, as their size is too high and they possess features of big data.

Thus a qualifying model is required for the dataset, to be fed into Hadoop so that it could efficiently process them to check for plagiarism in source codes.

To generate such a model, machine learning is used which incorporates big data with machine learning.

Back

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

Advanced frameworks for fraud detection leveraging quantum machine learning and data science in fintech ecosystems

The rapid expansion of the fintech sector has brought with it an increasing demand for robust and sophisticated fraud detection systems capable of managing large volumes of financi...

Similarity Search with Data Missing

Similarity search is a fundamental research problem with broad applications in various research fields, including data mining, information retrieval, and machine learning. The core...

An Approach to Machine Learning

The process of automatically recognising significant patterns within large amounts of data is called "machine learning." Throughout the last couple of decades, it has evolved into ...

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

Integrating quantum neural networks with machine learning algorithms for optimizing healthcare diagnostics and treatment outcomes

The rapid advancements in artificial intelligence (AI) and quantum computing have catalyzed an unprecedented shift in the methodologies utilized for healthcare diagnostics and trea...

Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic 

Abstract Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...

Email:
Password:

Email:

Similarity Detection in Large Volume Data using Machine Learning Techniques

Related Results