Javascript must be enabled to continue!
The Integrity of Source Code Commenting : Benchmark Dataset and Empirical Analysis
View through CrossRef
Abstract
Code comments are a vital software feature for program cognition &software maintainability. For a long time, researchers have been tryingto find ways to ensure the consistency of code-comment. While doingthat, two of the raised problems have been dataset scarcity and languagedependency. To address both problems in this paper, we worked on adataset creation made using C# projects; there are no annotated datasetsyet on C#. 9,310 code-comment pairs of different C# projects wereextracted from a data pool. 4,922 code-comment pairs were annotatedafter removing NULL, constructor, and variable. Both method-commentand class-comment were considered in this study. We employed twoevaluation metrics for the dataset, one is Krippendorff’s Alpha whichshowed 95.67% similarity among the rating of 3 annotators for all thepairs & other is Bilingual Evaluation Understudy (BLEU) to validateour human-curated dataset. A modified model from a previous study isalso proposed, which obtained 96.2% using the performance metric AUC-ROC after fitting the model to our annotated 4,922 code-comment pairs.
Research Square Platform LLC
Title: The Integrity of Source Code Commenting : Benchmark Dataset and Empirical Analysis
Description:
Abstract
Code comments are a vital software feature for program cognition &software maintainability.
For a long time, researchers have been tryingto find ways to ensure the consistency of code-comment.
While doingthat, two of the raised problems have been dataset scarcity and languagedependency.
To address both problems in this paper, we worked on adataset creation made using C# projects; there are no annotated datasetsyet on C#.
9,310 code-comment pairs of different C# projects wereextracted from a data pool.
4,922 code-comment pairs were annotatedafter removing NULL, constructor, and variable.
Both method-commentand class-comment were considered in this study.
We employed twoevaluation metrics for the dataset, one is Krippendorff’s Alpha whichshowed 95.
67% similarity among the rating of 3 annotators for all thepairs & other is Bilingual Evaluation Understudy (BLEU) to validateour human-curated dataset.
A modified model from a previous study isalso proposed, which obtained 96.
2% using the performance metric AUC-ROC after fitting the model to our annotated 4,922 code-comment pairs.
Related Results
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...
Developing guidelines for research institutions
Developing guidelines for research institutions
As introduced in Chapter 1, in this thesis, I developed guidelines to research institutions on how to foster research integrity. I did this by exploring how research institutions c...
Actualització consistent de bases de dades deductives
Actualització consistent de bases de dades deductives
En aquesta tesi, proposem un nou mètode per a l'actualització consistent de bases de dades deductives. Donada una petició d'actualització, aquest mètode tradueix de forma automàtic...
A large-scale analysis of bioinformatics code on GitHub
A large-scale analysis of bioinformatics code on GitHub
AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of sof...
An empirical study on software understandability and its dependence on code characteristics
An empirical study on software understandability and its dependence on code characteristics
AbstractContextInsufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measure...
Alih Kode Dan Campur Kode Dalam Interaksi Masyarakat Terminal Motabuik Kota Atambua
Alih Kode Dan Campur Kode Dalam Interaksi Masyarakat Terminal Motabuik Kota Atambua
This research aims to describe the use of language in community interactions at the Motabuik terminal, Atambua City. The use of language in question is the form and function of cod...
Design of Malicious Code Detection System Based on Binary Code Slicing
Design of Malicious Code Detection System Based on Binary Code Slicing
<p>Malicious code threatens the safety of computer systems. Researching malicious code design techniques and mastering code behavior patterns are the basic work of network se...

