Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Comparative Analysis of Resampling Techniques for Class Imbalance in Financial Distress Prediction Using XGBoost

View through CrossRef
One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost algorithm. The study was performed on a dataset acquired from the CSMAR database, containing 26,383 firm-quarter samples from 639 Chinese A-share listed companies (2007–2024), with only 12.1% of the cases being distressed. Results show that standard Synthetic Minority Oversampling Technique (SMOTE) enhanced F1-score (up to 0.73) and Matthews Correlation Coefficient (MCC, up to 0.70), while SMOTE-Tomek and Borderline-SMOTE further boosted recall, slightly sacrificing precision. These oversampling and hybrid methods also maintained reasonable computational efficiency. However, Random Undersampling (RUS), though yielding high recall (0.85), suffered from low precision (0.46) and weaker generalization, but was the fastest method. Among all techniques, Bagging-SMOTE achieved balanced performance (AUC 0.96, F1 0.72, PR-AUC 0.80, MCC 0.68) using a minority-to-majority ratio of 0.15, demonstrating that ensemble-based resampling can improve robustness with minimal impact on the original class distribution, albeit with higher computational cost. The compared findings highlight that no single approach fits all use cases, and technique selection should align with specific goals. Techniques favoring recall (e.g., Bagging-SMOTE, SMOTE-Tomek) are suited for early warning, while conservative techniques (e.g., Tomek Links) help reduce false positives in risk-sensitive applications, and efficient methods such as RUS are preferable when computational speed is a priority.
Title: Comparative Analysis of Resampling Techniques for Class Imbalance in Financial Distress Prediction Using XGBoost
Description:
One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples.
This study examines eight resampling techniques for improving distress prediction using the XGBoost algorithm.
The study was performed on a dataset acquired from the CSMAR database, containing 26,383 firm-quarter samples from 639 Chinese A-share listed companies (2007–2024), with only 12.
1% of the cases being distressed.
Results show that standard Synthetic Minority Oversampling Technique (SMOTE) enhanced F1-score (up to 0.
73) and Matthews Correlation Coefficient (MCC, up to 0.
70), while SMOTE-Tomek and Borderline-SMOTE further boosted recall, slightly sacrificing precision.
These oversampling and hybrid methods also maintained reasonable computational efficiency.
However, Random Undersampling (RUS), though yielding high recall (0.
85), suffered from low precision (0.
46) and weaker generalization, but was the fastest method.
Among all techniques, Bagging-SMOTE achieved balanced performance (AUC 0.
96, F1 0.
72, PR-AUC 0.
80, MCC 0.
68) using a minority-to-majority ratio of 0.
15, demonstrating that ensemble-based resampling can improve robustness with minimal impact on the original class distribution, albeit with higher computational cost.
The compared findings highlight that no single approach fits all use cases, and technique selection should align with specific goals.
Techniques favoring recall (e.
g.
, Bagging-SMOTE, SMOTE-Tomek) are suited for early warning, while conservative techniques (e.
g.
, Tomek Links) help reduce false positives in risk-sensitive applications, and efficient methods such as RUS are preferable when computational speed is a priority.

Related Results

Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Benchmarking Bayesian methods for spectroscopy
Benchmarking Bayesian methods for spectroscopy
<p class="p1"><span class="s1"><strong>Introduction:</strong></span>&l...
Fuze Well Mechanical Interface
Fuze Well Mechanical Interface
<div class="section abstract"> <div class="htmlview paragraph">This interface standard applies to fuzes used in airborne weapons that use a 3-Inch Fuze Well. It defin...
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
<div class="section abstract"> <div class="htmlview paragraph">This recommended practice provides guidance on vehicle Cybersecurity and was created based off of, and ...
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
<div class="section abstract"> <div class="htmlview paragraph">This recommended practice provides guidance on vehicle Cybersecurity and was created based off of, and ...
Klasifikasi Status Indeks Desa Membangun Jawa Barat Menggunakan Algoritma XGBoost
Klasifikasi Status Indeks Desa Membangun Jawa Barat Menggunakan Algoritma XGBoost
Abstract. Based on data from Statistics Indonesia 2020 shows that rural areas in West Java have an average poverty rate of 10,64%, which is higher than urban areas at 7,79%. To est...
Psychosocial Distress Among Cancer Patients: A single Institution Experience at the State of Qatar
Psychosocial Distress Among Cancer Patients: A single Institution Experience at the State of Qatar
Abstract Introduction The prevalence of psychosocial distress is up to 45% among cancer patients. It is crucial to identify and treat distress. The aim of the study is to r...

Back to Top