Javascript must be enabled to continue!
Boosting Commit Classification with Contrastive Learning
View through CrossRef
Abstract
Commit Classification (CC) is an important task in software maintenance, which helps software developers classify commit changes into different types according to their nature and purpose. However, existing models need lots of manually labeled data for model fine-tuning, when training samples are insufficient, ensuring the performance of commit classification becomes very critical. The scarcity of data also leads to the problem of poor model generalization ability, resulting in satisfactory performance only on specific tasks. Moreover, they often ignore the sentence-level semantic information in the commit message, which is essential for discovering the difference between diverse commits, especially for fewshot scenarios. In this work, we propose to boost commit classification with contrastive learning. This method can solve the CC problem in fewshot scenarios. To augment the training datasets and improve the generalization ability of our proposed method, we generate additional training samples by Semantic Prototype, which is defined as a representative embedding for a group of semantically similar instances. To produce meaningful and discriminating sentence-level vectors for each commit in a pair, we employ a pretrained Sentence-Transformer as the embedding layer. The network then learns to maximize the distance in the latent space for positive pairs and minimize it for negative pairs, leading to a fine-tuned Sentence-Transformer with fixed weights for the downstream commit classification task. Extensive experiments on two open available datasets demonstrate that our framework, though simple, can solve the CC problem effectively even in fewshot scenarios. It not only achieves state-of-the-art performance but also improves the adaptability of the model without requiring a large number of training samples for fine-tuning. The code, data, and trained models are available at https://github.com/CUMT-GMSC/CommitFit.
Title: Boosting Commit Classification with Contrastive Learning
Description:
Abstract
Commit Classification (CC) is an important task in software maintenance, which helps software developers classify commit changes into different types according to their nature and purpose.
However, existing models need lots of manually labeled data for model fine-tuning, when training samples are insufficient, ensuring the performance of commit classification becomes very critical.
The scarcity of data also leads to the problem of poor model generalization ability, resulting in satisfactory performance only on specific tasks.
Moreover, they often ignore the sentence-level semantic information in the commit message, which is essential for discovering the difference between diverse commits, especially for fewshot scenarios.
In this work, we propose to boost commit classification with contrastive learning.
This method can solve the CC problem in fewshot scenarios.
To augment the training datasets and improve the generalization ability of our proposed method, we generate additional training samples by Semantic Prototype, which is defined as a representative embedding for a group of semantically similar instances.
To produce meaningful and discriminating sentence-level vectors for each commit in a pair, we employ a pretrained Sentence-Transformer as the embedding layer.
The network then learns to maximize the distance in the latent space for positive pairs and minimize it for negative pairs, leading to a fine-tuned Sentence-Transformer with fixed weights for the downstream commit classification task.
Extensive experiments on two open available datasets demonstrate that our framework, though simple, can solve the CC problem effectively even in fewshot scenarios.
It not only achieves state-of-the-art performance but also improves the adaptability of the model without requiring a large number of training samples for fine-tuning.
The code, data, and trained models are available at https://github.
com/CUMT-GMSC/CommitFit.
Related Results
Temporal-Aware and Intent Contrastive Learning for Sequential Recommendation
Temporal-Aware and Intent Contrastive Learning for Sequential Recommendation
In recent years, research in sequential recommendation has primarily refined user intent by constructing sequence-level contrastive learning tasks through data augmentation or by e...
Grouped Contrastive Learning of Self-supervised Sentence Representation
Grouped Contrastive Learning of Self-supervised Sentence Representation
This paper proposes a Grouped Contrastive Learning of self-supervised Sentence Representation (GCLSR), which can learn an effective and meaningful representation of sentences. Prev...
Contrastive Distillation Learning with Sparse Spatial Aggregation
Contrastive Distillation Learning with Sparse Spatial Aggregation
Abstract
Contrastive learning has advanced significantly and demonstrates excellent transfer learning capabilities. Knowledge distillation is one of the most effective meth...
Improving Neural Retrieval with Contrastive Learning
Improving Neural Retrieval with Contrastive Learning
In recent years, neural retrieval models have shown remarkable progress in improving the efficiency and accuracy of information retrieval systems. However, challenges remain in eff...
Analyzing Data Augmentation Techniques for Contrastive Learning in Recommender Models
Analyzing Data Augmentation Techniques for Contrastive Learning in Recommender Models
This paper investigates the application of contrastive learning-based user and item representation learning in recommendation systems. A recommendation model combining contrastive ...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-by-step follo...
An Asymmetric Contrastive Loss for Handling Imbalanced Datasets
An Asymmetric Contrastive Loss for Handling Imbalanced Datasets
Contrastive learning is a representation learning method performed by contrasting a sample to other similar samples so that they are brought closely together, forming clusters in t...

