Art Boulevard

Javascript must be enabled to continue!

Plagiarism is the act of taking part or all of one's ideas in the form of documents or texts without including sources of information retrieval. This study aims to detect the similarity of text documents using the cosine similarity algorithm and weighting TF-IDF so that it can be used to determine the value of plagiarism. The document used for comparison of this text is an abstract of Indonesian. The results of the study, namely when stemming the similarity value is higher on average 10% than the stemming process is not done. This study produces a similarity value above 50% for documents with a high degree of similarity. Whereas documents with low similarity levels or no plagiarism produce similarity values below 40%. With the method used in the preprocessing consisting of folding cases, tokenizing, removeal stopwords, and stemming. After the preprocessing process, the next step is to calculate the weighting of TF-IDF and the similarity value using cosine similarity so that it gets a percentage similarity value. Based on the experimental results of the cosine similarity algorithm and weighting TF-IDF, it can produce similarity values from each comparative document

Tanjungpura University

Muhammad Zidny Naf'an Auliya Burhanuddin Ade Riyani

Jurnal Linguistik Komputasional (JLK)

2019

Title:

Description:

Plagiarism is the act of taking part or all of one's ideas in the form of documents or texts without including sources of information retrieval.

This study aims to detect the similarity of text documents using the cosine similarity algorithm and weighting TF-IDF so that it can be used to determine the value of plagiarism.

The document used for comparison of this text is an abstract of Indonesian.

The results of the study, namely when stemming the similarity value is higher on average 10% than the stemming process is not done.

This study produces a similarity value above 50% for documents with a high degree of similarity.

Whereas documents with low similarity levels or no plagiarism produce similarity values below 40%.

With the method used in the preprocessing consisting of folding cases, tokenizing, removeal stopwords, and stemming.

After the preprocessing process, the next step is to calculate the weighting of TF-IDF and the similarity value using cosine similarity so that it gets a percentage similarity value.

Based on the experimental results of the cosine similarity algorithm and weighting TF-IDF, it can produce similarity values from each comparative document.

Back

Email:
Password:

Email: