Javascript must be enabled to continue!

The Integrity of Source Code Commenting : Benchmark Dataset and Empirical Analysis

Abstract Code comments are a vital software feature for program cognition &software maintainability. For a long time, researchers have been tryingto find ways to ensure the consistency of code-comment. While doingthat, two of the raised problems have been dataset scarcity and languagedependency. To address both problems in this paper, we worked on adataset creation made using C# projects; there are no annotated datasetsyet on C#. 9,310 code-comment pairs of different C# projects wereextracted from a data pool. 4,922 code-comment pairs were annotatedafter removing NULL, constructor, and variable. Both method-commentand class-comment were considered in this study. We employed twoevaluation metrics for the dataset, one is Krippendorff’s Alpha whichshowed 95.67% similarity among the rating of 3 annotators for all thepairs & other is Bilingual Evaluation Understudy (BLEU) to validateour human-curated dataset. A modified model from a previous study isalso proposed, which obtained 96.2% using the performance metric AUC-ROC after fitting the model to our annotated 4,922 code-comment pairs.

Research Square Platform LLC

Maksuda Islam Mohammad Safayat Hossen Ahsanul Haque Md. Nazmul Haque Lutfun Nahar Lota

2022

Title: The Integrity of Source Code Commenting : Benchmark Dataset and Empirical Analysis

Description:

Abstract Code comments are a vital software feature for program cognition &software maintainability.

For a long time, researchers have been tryingto find ways to ensure the consistency of code-comment.

While doingthat, two of the raised problems have been dataset scarcity and languagedependency.

To address both problems in this paper, we worked on adataset creation made using C# projects; there are no annotated datasetsyet on C#.

9,310 code-comment pairs of different C# projects wereextracted from a data pool.

4,922 code-comment pairs were annotatedafter removing NULL, constructor, and variable.

Both method-commentand class-comment were considered in this study.

We employed twoevaluation metrics for the dataset, one is Krippendorff’s Alpha whichshowed 95.

67% similarity among the rating of 3 annotators for all thepairs & other is Bilingual Evaluation Understudy (BLEU) to validateour human-curated dataset.

A modified model from a previous study isalso proposed, which obtained 96.

2% using the performance metric AUC-ROC after fitting the model to our annotated 4,922 code-comment pairs.

Back

<code>Intelligent reflecting surface (IRS) is a promising concept for </code><code>6G</code><code> wireless communications...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<code>Intelligent reflecting surface (IRS) is a promising concept for </code><code>6G</code><code> wireless communications...

Developing guidelines for research institutions

As introduced in Chapter 1, in this thesis, I developed guidelines to research institutions on how to foster research integrity. I did this by exploring how research institutions c...

Actualització consistent de bases de dades deductives

En aquesta tesi, proposem un nou mètode per a l'actualització consistent de bases de dades deductives. Donada una petició d'actualització, aquest mètode tradueix de forma automàtic...

A large-scale analysis of bioinformatics code on GitHub

AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of sof...

An empirical study on software understandability and its dependence on code characteristics

AbstractContextInsufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measure...

Alih Kode Dan Campur Kode Dalam Interaksi Masyarakat Terminal Motabuik Kota Atambua

This research aims to describe the use of language in community interactions at the Motabuik terminal, Atambua City. The use of language in question is the form and function of cod...

Design of Malicious Code Detection System Based on Binary Code Slicing

Malicious code threatens the safety of computer systems. Researching malicious code design techniques and mastering code behavior patterns are the basic work of network se...

Email:
Password:

Email:

The Integrity of Source Code Commenting : Benchmark Dataset and Empirical Analysis

Related Results