Javascript must be enabled to continue!
An Approach to Semantic and Structural Features Learning for Software Defect Prediction
View through CrossRef
Research on software defect prediction has achieved great success at modeling predictors. To build more accurate predictors, a number of hand-crafted features are proposed, such as static code features, process features, and social network features. Few models, however, consider the semantic and structural features of programs. Understanding the context information of source code files could explain a lot about the cause of defects in software. In this paper, we leverage representation learning for semantic and structural features generation. Specifically, we first extract token vectors of code files based on the Abstract Syntax Trees (ASTs) and then feed the token vectors into Convolutional Neural Network (CNN) to automatically learn semantic features. Meanwhile, we also construct a complex network model based on the dependencies between code files, namely, software network (SN). After that, to learn the structural features, we apply the network embedding method to the resulting SN. Finally, we build a novel software defect prediction model based on the learned semantic and structural features (SDP-S2S). We evaluated our method on 6 projects collected from public PROMISE repositories. The results suggest that the contribution of structural features extracted from software network is prominent, and when combined with semantic features, the results seem to be better. In addition, compared with the traditional hand-crafted features, the F-measure values of SDP-S2S are generally increased, with a maximum growth rate of 99.5%. We also explore the parameter sensitivity in the learning process of semantic and structural features and provide guidance for the optimization of predictors.
Title: An Approach to Semantic and Structural Features Learning for Software Defect Prediction
Description:
Research on software defect prediction has achieved great success at modeling predictors.
To build more accurate predictors, a number of hand-crafted features are proposed, such as static code features, process features, and social network features.
Few models, however, consider the semantic and structural features of programs.
Understanding the context information of source code files could explain a lot about the cause of defects in software.
In this paper, we leverage representation learning for semantic and structural features generation.
Specifically, we first extract token vectors of code files based on the Abstract Syntax Trees (ASTs) and then feed the token vectors into Convolutional Neural Network (CNN) to automatically learn semantic features.
Meanwhile, we also construct a complex network model based on the dependencies between code files, namely, software network (SN).
After that, to learn the structural features, we apply the network embedding method to the resulting SN.
Finally, we build a novel software defect prediction model based on the learned semantic and structural features (SDP-S2S).
We evaluated our method on 6 projects collected from public PROMISE repositories.
The results suggest that the contribution of structural features extracted from software network is prominent, and when combined with semantic features, the results seem to be better.
In addition, compared with the traditional hand-crafted features, the F-measure values of SDP-S2S are generally increased, with a maximum growth rate of 99.
5%.
We also explore the parameter sensitivity in the learning process of semantic and structural features and provide guidance for the optimization of predictors.
Related Results
Ensemble Machine Learning Model for Software Defect Prediction
Ensemble Machine Learning Model for Software Defect Prediction
Software defect prediction is a significant activity in every software firm. It helps in producing quality software by reliable defect prediction, defect elimination, and predictio...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
Visual software defect prediction method based on improved recurrent criss-cross residual network
Visual software defect prediction method based on improved recurrent criss-cross residual network
Purpose
This study aims to solve the problems of large training sample size, low data sample quality, low efficiency of the currently used classical model, high computational compl...
Semantic Description and Complete Computer Characterization of Structural Geological Models
Semantic Description and Complete Computer Characterization of Structural Geological Models
Abstract. A structural geological model is an important basis for the understanding of subsurface structures and exploration of mineral resources, especially petroleum reservoirs. ...
Feature selection using a multi-strategy improved parrot optimization algorithm in software defect prediction
Feature selection using a multi-strategy improved parrot optimization algorithm in software defect prediction
Software defect detection is a critical research topic in the field of software engineering, aiming to identify potential defects during the development process to improve software...
Semantic Excel: An Introduction to a User-Friendly Online Software Application for Statistical Analyses of Text Data
Semantic Excel: An Introduction to a User-Friendly Online Software Application for Statistical Analyses of Text Data
Semantic Excel (www.semanticexcel.com) is an online software application with a simple, yet powerful interface enabling users to perform statistical analyses on texts. The purpose ...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
Exploiting Wikipedia Semantics for Computing Word Associations
Exploiting Wikipedia Semantics for Computing Word Associations
<p><b>Semantic association computation is the process of automatically quantifying the strength of a semantic connection between two textual units based on various lexi...

