Javascript must be enabled to continue!
Dataset Construction for Landslide Susceptibility Mapping Using Multi-Buffer Zones, Clustering, and Stratified Sampling
View through CrossRef
Landslide susceptibility mapping is a vital tool for identifying areas vulnerable to slope instability and mitigating related hazards. A critical challenge in this process is constructing a robust, diverse, and balanced training dataset that accurately distinguishes landslide-prone areas from stable regions. This study proposes a methodology that integrates multi-buffer zoning, clustering-based sampling, and stratified sampling to enhance predictive accuracy and dataset representativeness.The study was conducted in the Paphos district of Cyprus, an area of 552 km² that has experienced over 1,800 recorded landslides. The region’s geomorphological complexity, shaped by diverse topographic, geological, hydrological, and land-use conditions, makes it an ideal setting for advancing landslide susceptibility mapping techniques. A comprehensive dataset incorporating key environmental variables—such as slope, elevation, curvature, lithology, proximity to faults, and land cover—was compiled for analysis.To develop the training dataset, documented landslide points were paired with non-landslide points generated from three spatial buffer zones: 250 m, 500 m, and 750 m around landslide sites. To further improve data diversity, clustering-based sampling grouped data points based on geomorphological and environmental similarities, while stratified sampling ensured proportional representation of critical variables in the dataset.Three machine learning models—Logistic Regression (LR), Random Forest (RF), and XGBoost—were employed to evaluate the predictive performance of datasets constructed using individual buffer zones, clustering, and stratification techniques. Model performance was assessed using metrics such as Accuracy, F1 Score, Cohen’s Kappa, and Area Under the Curve (AUC) to determine the effectiveness of each dataset.The results revealed clear distinctions between datasets. The 750 m buffer dataset outperformed the others, with XGBoost achieving an Accuracy of 93.92%, F1 Score of 93.86%, Cohen’s Kappa of 87.84%, and AUC of 98.36%. This dataset effectively captured stable environmental conditions, improving model robustness and generalizability. The 500 m buffer dataset also performed well, with XGBoost achieving an Accuracy of 92.36% and an AUC of 97.66%, while the 250 m buffer dataset, exhibited slightly lower performance, with XGBoost achieving an Accuracy of 89.36% and an AUC of 95.77%.The clustering-based sampling approach also demonstrated strong results, with RF achieving an Accuracy of 92.44% and an AUC of 97.19%, suggesting that grouping data points based on shared characteristics enhances model precision. Finally, the combined dataset, which integrated clustering-based and stratified sampling, yielded robust results, with XGBoost achieving an Accuracy of 93.74%, Cohen’s Kappa of 85.99%, and AUC of 97.99%.In conclusion, the proposed approach demonstrates the value of integrating multi-buffer zoning, clustering, and stratified sampling into susceptibility mapping frameworks. This study not only advances our understanding of landslide processes in the Paphos district but also provides a scalable, reliable methodology for landslide risk assessment in other regions, contributing to more resilient landscapes and communities.This research was funded by the European Commission, project reference: ENTERPRISES/0223/Sub-Call1/0229
Title: Dataset Construction for Landslide Susceptibility Mapping Using Multi-Buffer Zones, Clustering, and Stratified Sampling
Description:
Landslide susceptibility mapping is a vital tool for identifying areas vulnerable to slope instability and mitigating related hazards.
A critical challenge in this process is constructing a robust, diverse, and balanced training dataset that accurately distinguishes landslide-prone areas from stable regions.
This study proposes a methodology that integrates multi-buffer zoning, clustering-based sampling, and stratified sampling to enhance predictive accuracy and dataset representativeness.
The study was conducted in the Paphos district of Cyprus, an area of 552 km² that has experienced over 1,800 recorded landslides.
The region’s geomorphological complexity, shaped by diverse topographic, geological, hydrological, and land-use conditions, makes it an ideal setting for advancing landslide susceptibility mapping techniques.
A comprehensive dataset incorporating key environmental variables—such as slope, elevation, curvature, lithology, proximity to faults, and land cover—was compiled for analysis.
To develop the training dataset, documented landslide points were paired with non-landslide points generated from three spatial buffer zones: 250 m, 500 m, and 750 m around landslide sites.
To further improve data diversity, clustering-based sampling grouped data points based on geomorphological and environmental similarities, while stratified sampling ensured proportional representation of critical variables in the dataset.
Three machine learning models—Logistic Regression (LR), Random Forest (RF), and XGBoost—were employed to evaluate the predictive performance of datasets constructed using individual buffer zones, clustering, and stratification techniques.
Model performance was assessed using metrics such as Accuracy, F1 Score, Cohen’s Kappa, and Area Under the Curve (AUC) to determine the effectiveness of each dataset.
The results revealed clear distinctions between datasets.
The 750 m buffer dataset outperformed the others, with XGBoost achieving an Accuracy of 93.
92%, F1 Score of 93.
86%, Cohen’s Kappa of 87.
84%, and AUC of 98.
36%.
This dataset effectively captured stable environmental conditions, improving model robustness and generalizability.
The 500 m buffer dataset also performed well, with XGBoost achieving an Accuracy of 92.
36% and an AUC of 97.
66%, while the 250 m buffer dataset, exhibited slightly lower performance, with XGBoost achieving an Accuracy of 89.
36% and an AUC of 95.
77%.
The clustering-based sampling approach also demonstrated strong results, with RF achieving an Accuracy of 92.
44% and an AUC of 97.
19%, suggesting that grouping data points based on shared characteristics enhances model precision.
Finally, the combined dataset, which integrated clustering-based and stratified sampling, yielded robust results, with XGBoost achieving an Accuracy of 93.
74%, Cohen’s Kappa of 85.
99%, and AUC of 97.
99%.
In conclusion, the proposed approach demonstrates the value of integrating multi-buffer zoning, clustering, and stratified sampling into susceptibility mapping frameworks.
This study not only advances our understanding of landslide processes in the Paphos district but also provides a scalable, reliable methodology for landslide risk assessment in other regions, contributing to more resilient landscapes and communities.
This research was funded by the European Commission, project reference: ENTERPRISES/0223/Sub-Call1/0229.
Related Results
Comparing the performance of Machine Learning Methods in landslide susceptibility modelling
Comparing the performance of Machine Learning Methods in landslide susceptibility modelling
Landslide phenomena are considered as one of the most significant geohazards with a great impact on the man-made and natural environment. If one search the scientific literature, t...
Meteorological drivers of seasonal motion at the Barry Arm Landslide, Prince William Sound, Alaska
Meteorological drivers of seasonal motion at the Barry Arm Landslide, Prince William Sound, Alaska
Global climate change creates geologic hazard cascades as the cryosphere experiences warming. The rapid retreat of Barry Glacier, a tidewater glacier in Prince William Sound, Alask...
Landslide Susceptibility Analysis Based on Dataset Construction of Landslides in Yiyang Using GIS and Machine Learning
Landslide Susceptibility Analysis Based on Dataset Construction of Landslides in Yiyang Using GIS and Machine Learning
This study aims to explore the methodology for assessing landslide susceptibility by using machine learning techniques based on a geographic information system (GIS) in an effort t...
Penentuan Zona Kerentanan Longsor Berdasarkan Karakteristik Geologi dan Alterasi Batuan
Penentuan Zona Kerentanan Longsor Berdasarkan Karakteristik Geologi dan Alterasi Batuan
ABSTRACT Landslide is one of the most frequent disasters in Indonesia. The occurrence of landslides is heavily controlled by geological conditions especially in areas with composed...
Landslide Susceptibility Modelling of Central Highland Part of Chaliyar River Basin, Kerala, India with Integrated Algorithms of Frequency Ratio and Shannon Entropy
Landslide Susceptibility Modelling of Central Highland Part of Chaliyar River Basin, Kerala, India with Integrated Algorithms of Frequency Ratio and Shannon Entropy
An integrated landslide susceptibility analysis is carried out for the central highland region of the Chaliyar River Basin in Kerala, India using bivariate statistical methods, nam...
Investigation of planar sliding deformation and analysis of the damage mechanism of a rocky landslide in Yaoping triggered by highway excavation in Hubei, China
Investigation of planar sliding deformation and analysis of the damage mechanism of a rocky landslide in Yaoping triggered by highway excavation in Hubei, China
During projects to build roads in China's mountainous areas, which are often characterized by the poor stability of rocky slopes, cases of deformation damage occur frequently. Beca...
Landslide hydro-meteorological thresholds in Rwanda
Landslide hydro-meteorological thresholds in Rwanda
<p>For the development of regional landslide early warning systems, empirical-statistical thresholds are of crucial importance. The thresholds indicate the meteorolog...
Guidelines Of Indicator Based Landslide Vulnerability Analysis and Risk Classification for Critical Infrastructure in Malaysia
Guidelines Of Indicator Based Landslide Vulnerability Analysis and Risk Classification for Critical Infrastructure in Malaysia
Landslide is considered as the natural hazards that can cause harms to the environment, economy,and critical infrastructure. Damage to the critical infrastructure will further disr...

