Javascript must be enabled to continue!

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study sets two main objectives. The first objective is to investigate the effect of resampling methods for handling imbalanced data on model effectiveness. The second objective is to compare and evaluate machine learning methods to identify the optimal model for each resampling technique, thereby determining the model that achieves the highest performance. Background: In the highly competitive banking industry, attrition of customers is a major challenge for banks trying to improve customer retention. While many studies have focused on building and evaluating models to predict customer churn, they often miss addressing the problem of imbalanced data, which can significantly affect the model’s accuracy. Methodology: In this study, following exploratory data analysis (EDA), we apply various techniques to address data imbalance and use a range of machine learning models, including Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting, XGBoost, and LightGBM, to predict customer churn using the dataset. Contribution: The contribution of this research lies in its comprehensive evaluation and comparison of various techniques for handling imbalanced data in churn prediction models. The study identifies SMOTE-ENN as the most effective method for resampling imbalanced data. Among the models tested, LightGBM (accuracy = 0.979) achieves the highest performance based on evaluation metrics. Additionally, the research highlights that tree-based machine learning models generally perform better when trained on imbalanced datasets. Findings: Tree-based and ensemble models perform better than regression and probability-based methods when dealing with imbalanced data. SMOTE-ENN has been shown to improve the performance of machine learning models greatly. Recommendations for Practitioners: Practitioners can deploy high-performance models, such as XGBoost and LightGBM, combined with effective resampling methods like SMOTE-ENN to predict customer churn in banking, marketing, and human resources. Recommendation for Researchers: To optimize the predictive model in the study, researchers can focus on feature selection, dimensionality reduction, or hyperparameter tuning. Impact on Society: Customer churn reduces revenue and threatens competitive advantage, so businesses need effective retention strategies to maintain sustainable growth. High-performance customer churn prediction models can be an effective solution to address this issue. Future Research: Deploy the model on real-world datasets while further optimizing the feature selection process and hyperparameter tuning, combined with SHAP values analysis to identify key features that significantly influence the model’s predictions.

Informing Science Institute

Tam-Thanh Luong Vi-Gia Luong Anh Hoang Tuan Tran Tuan Manh Nguyen

Interdisciplinary Journal of Information, Knowledge, and Management

2025

Title: Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Description:

Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance.

Therefore, this study sets two main objectives.

The first objective is to investigate the effect of resampling methods for handling imbalanced data on model effectiveness.

The second objective is to compare and evaluate machine learning methods to identify the optimal model for each resampling technique, thereby determining the model that achieves the highest performance.

Background: In the highly competitive banking industry, attrition of customers is a major challenge for banks trying to improve customer retention.

While many studies have focused on building and evaluating models to predict customer churn, they often miss addressing the problem of imbalanced data, which can significantly affect the model’s accuracy.

Methodology: In this study, following exploratory data analysis (EDA), we apply various techniques to address data imbalance and use a range of machine learning models, including Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting, XGBoost, and LightGBM, to predict customer churn using the dataset.

Contribution: The contribution of this research lies in its comprehensive evaluation and comparison of various techniques for handling imbalanced data in churn prediction models.

The study identifies SMOTE-ENN as the most effective method for resampling imbalanced data.

Among the models tested, LightGBM (accuracy = 0.

979) achieves the highest performance based on evaluation metrics.

Additionally, the research highlights that tree-based machine learning models generally perform better when trained on imbalanced datasets.

Findings: Tree-based and ensemble models perform better than regression and probability-based methods when dealing with imbalanced data.

SMOTE-ENN has been shown to improve the performance of machine learning models greatly.

Recommendations for Practitioners: Practitioners can deploy high-performance models, such as XGBoost and LightGBM, combined with effective resampling methods like SMOTE-ENN to predict customer churn in banking, marketing, and human resources.

Recommendation for Researchers: To optimize the predictive model in the study, researchers can focus on feature selection, dimensionality reduction, or hyperparameter tuning.

Impact on Society: Customer churn reduces revenue and threatens competitive advantage, so businesses need effective retention strategies to maintain sustainable growth.

High-performance customer churn prediction models can be an effective solution to address this issue.

Future Research: Deploy the model on real-world datasets while further optimizing the feature selection process and hyperparameter tuning, combined with SHAP values analysis to identify key features that significantly influence the model’s predictions.

Back

Nowadays, there is no shortage of options for customers when choosing where to put their money. As a result, customer churn and engagement have become one of the top issues. With t...

The Impact of Customer Service Quality on Customer Satisfaction: A study on Bangladeshi Banks

Abstract This research study examines the impact of customer service quality on customer satisfaction at Bangladeshi Banks. The study aimed to fill existing gaps in underst...

Churn prediction using machine learning: A coupon optimization technique

Customer retention has been identified as one of the most crucial difficulties in every Business particularly in the grocery retail industry. In this context, an accurate forecast ...

Customer Churn Prediction Using Machine Learning Algorithms

In today’s highly competitive industries, retaining customers is vital for sustaining business growth and profitability. Customer churn, the phenomenon where customers switch from ...

Yayak Kartika Sari Prediksi Customer Churn Berbasis Adaptive Neuro Fuzzy Inference System

Abstrak – Customer Churn adalah pelanggan yang berhenti berlangganan dan pindahpada perusahaan lain, karena berbagai faktor. Customer churn merupakan masalah yang sangatpenting yan...

EFFICIENCY OF THE ACTIVITIES OF BANKING INSTITUTIONS IN UKRAINE

Introduction. The article examines statistical data on the number of banks that have a banking license, banks with foreign capital and the dynamics of the influence of foreign capi...

A Comparative Study of Machine Learning Models for Predicting Customer Churn in Retail Banking: Insights from Logistic Regression, Random Forest, GBM, and SVM

Customer churn poses a significant challenge in the retail banking sector, leading to substantial financial losses and undermining long-term growth. This study explores the effecti...

Customer Churn Prediction Model Based on Adaptive Clustering Mixed-Sampling

Predicting the probability of customer churn is an important reference for formulating and implementing customer retention strategies. Compared with single classification method, e...

Email:
Password:

Email:

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Related Results