Javascript must be enabled to continue!
Application of Random Forest Algorithm and Multi-Temporal Satellite Data for Forest Types Classification in Chiang Mai Province
View through CrossRef
Background and Objectives: Chiang Mai Province is strategically important as a major watershed area for the Ping River basin ecosystem and boasts one of the highest biodiversity levels in Thailand. However, this area is currently facing a severe and challenging environmental crisis, particularly the rapid decline in forest area and recurring wildfires during the dry season. These fires have ongoing consequences, leading to transboundary air pollution and PM 2.5 levels that are hazardous to public health and the regional economy. The severity and spread of wildfires are directly related to fuel type and forest types. Deciduous forests, such as dry dipterocarp forest and mixed deciduous forest, which ecologically shed leaves and accumulate large amounts of dry biomass fuel during the dry season, are more susceptible to fire and more severe fire spread than evergreen forests, which have higher humidity levels. Therefore, an accurate, high-resolution, and up-to-date forest classification map database is urgently needed for fuel management planning, wildfire risk zone identification, and natural resource restoration. However, classifying forest types in the complex mountainous terrain of Chiang Mai Province presents a significant challenge for remote sensing due to physical factors such as mountain shadows and the similarity of spectral reflectance values among different vegetation species at certain times. Traditional methods using single-temporal satellite imagery cannot effectively distinguish between deciduous and evergreen forests. To overcome this limitation, integrating multi-temporal satellite imagery capable of tracking vegetation phenology changes throughout the year, along with machine learning technology on a large-scale data processing platform like Google Earth Engine (GEE), is a powerful approach. This research therefore aims to: 1) create a forest classification map of Chiang Mai Province for the year 2024 using a random forest (RF) algorithm, and 2) analyze the feature importance of both spectral indices and topographic factors to identify the ecological factors that have the greatest influence on classification accuracy.
Method: This study was conducted on the Google Earth Engine (GEE) platform to process large-scale geospatial data. The primary dataset comprised Sentinel-2 Level 2A (Surface Reflectance) imagery, which had been atmospherically corrected, covering the entire 22,436 square kilometers. The data preparation process was divided into two main parts. The first involved creating a median composite image during the dry season (December 1, 2023 – March 31, 2024), selecting only images with less than 60% cloud cover to serve as a cloud-free baseline for analyzing the relationships between variables. The second component involved constructing a time-series stack comprising representative monthly images from four dry-season periods and one wet-season period. This multi-temporal approach was specifically designed to capture the distinct phenological signatures of leaf shedding and greening. Furthermore, topographic data, including elevation, slope, and aspect, were derived from the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) and resampled to a 20-meter spatial resolution. The researchers initially calculated a comprehensive set of 26 predictor variables, encompassing vegetation indices, water and soil indices, forest-specific indices, and original spectral bands. To optimize model performance, a two-step Feature Selection process was implemented. Initially, Pearson’s correlation coefficient analysis was used to eliminate highly redundant variables (excluding those with r > 0.90 or r < -0.90). Subsequently, the remaining variables were ranked based on the Mean Decrease in Gini Impurity metric using the Random Forest algorithm. The classification model targeted three distinct classes: deciduous forest, evergreen forest, and non-forest. Reference data consisted of 750 standard ground-truth points, collected via stratified random sampling to ensure spatial independence. These points were randomly partitioned into an 80% training set (600 points) and a 20% testing set (150 points). The RF classifier was parameterized with 500 decision trees (ntrees) to maximize stability.
Results: The feature selection process successfully refined the dataset to an optimal subset of 12 non-redundant variables. The analysis revealed that topographic features were the most influential factors governing the model's predictive capability. Specifically, elevation was dominance the ranking with the highest importance score (629.27), followed by slope (492.16). Among the spectral predictors, the shadow index (SI) (215.45) and the Green Normalized Difference Vegetation Index (GNDVI) (184.90) proved to be the most critical variables, as they effectively captured the complexities of canopy structures and mitigated topographic shadow effects during the dry season. The developed RF model demonstrated exceptional performance, achieving a high Out-of-Bag (OOB) accuracy of 90.30%. When evaluated against the independent testing set, the model yielded an Overall Accuracy of 95.92% and a Kappa coefficient of 0.94. Class-specific performance analysis indicated that the non-forest class achieved the highest accuracy (Producer’s Accuracy 97.92%, User’s Accuracy 100%), followed by evergreen forest (Producer’s Accuracy 94.12%, User’s Accuracy 96.00%), and deciduous forest (Producer’s Accuracy 95.83%, User’s Accuracy 92.00%). The final spatial map revealed that deciduous forests cover approximately 48.70% (10,779.68 km2) of the province, predominantly distributed in foothills and mid-elevation zones, while evergreen forests account for 32.37% (7,164.34 km2), densely dominating the higher mountain ranges, which perfectly aligns with highland forest ecology principles. However, spatial verification identified specific limitations. Minor misclassifications were observed within ecological transition zones (ecotones) at elevations between 800 and 1,150 meters due to highly mixed forest structures. Additionally, spectral confusion occurred in agricultural areas containing perennial, long-living fruit orchards (e.g., longan and orange orchards), which maintain permanent green canopies that closely resemble the spectral signatures of natural evergreen forests.
Conclusion: This study demonstrates that applying a RF algorithm with multi-temporal Sentinel-2 imagery on a cloud computing platform provides a highly robust, cutting-edge tool for mapping complex forest ecosystems in mountainous regions. Empirical findings confirm that elevation serves as the most important ecological boundary for forest classification in northern Thailand. The resulting high-precision classification map serves as a vital spatial database for accurately delineating wildfire risk zones and supporting targeted natural resource management. For further enhance model accuracy in future research, it is highly recommended to refine the non-forest class by explicitly separating perennial fruit orchards into distinct sub-categories. Furthermore, incorporating Synthetic Aperture Radar (Sentinel-1 SAR) data or employing advanced texture analysis should be strongly considered to improve the differentiation of physical vegetation structures across complex landscapes.
Kasetsart University Research and Development Institute
Title: Application of Random Forest Algorithm and Multi-Temporal Satellite Data for Forest Types Classification in Chiang Mai Province
Description:
Background and Objectives: Chiang Mai Province is strategically important as a major watershed area for the Ping River basin ecosystem and boasts one of the highest biodiversity levels in Thailand.
However, this area is currently facing a severe and challenging environmental crisis, particularly the rapid decline in forest area and recurring wildfires during the dry season.
These fires have ongoing consequences, leading to transboundary air pollution and PM 2.
5 levels that are hazardous to public health and the regional economy.
The severity and spread of wildfires are directly related to fuel type and forest types.
Deciduous forests, such as dry dipterocarp forest and mixed deciduous forest, which ecologically shed leaves and accumulate large amounts of dry biomass fuel during the dry season, are more susceptible to fire and more severe fire spread than evergreen forests, which have higher humidity levels.
Therefore, an accurate, high-resolution, and up-to-date forest classification map database is urgently needed for fuel management planning, wildfire risk zone identification, and natural resource restoration.
However, classifying forest types in the complex mountainous terrain of Chiang Mai Province presents a significant challenge for remote sensing due to physical factors such as mountain shadows and the similarity of spectral reflectance values among different vegetation species at certain times.
Traditional methods using single-temporal satellite imagery cannot effectively distinguish between deciduous and evergreen forests.
To overcome this limitation, integrating multi-temporal satellite imagery capable of tracking vegetation phenology changes throughout the year, along with machine learning technology on a large-scale data processing platform like Google Earth Engine (GEE), is a powerful approach.
This research therefore aims to: 1) create a forest classification map of Chiang Mai Province for the year 2024 using a random forest (RF) algorithm, and 2) analyze the feature importance of both spectral indices and topographic factors to identify the ecological factors that have the greatest influence on classification accuracy.
Method: This study was conducted on the Google Earth Engine (GEE) platform to process large-scale geospatial data.
The primary dataset comprised Sentinel-2 Level 2A (Surface Reflectance) imagery, which had been atmospherically corrected, covering the entire 22,436 square kilometers.
The data preparation process was divided into two main parts.
The first involved creating a median composite image during the dry season (December 1, 2023 – March 31, 2024), selecting only images with less than 60% cloud cover to serve as a cloud-free baseline for analyzing the relationships between variables.
The second component involved constructing a time-series stack comprising representative monthly images from four dry-season periods and one wet-season period.
This multi-temporal approach was specifically designed to capture the distinct phenological signatures of leaf shedding and greening.
Furthermore, topographic data, including elevation, slope, and aspect, were derived from the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) and resampled to a 20-meter spatial resolution.
The researchers initially calculated a comprehensive set of 26 predictor variables, encompassing vegetation indices, water and soil indices, forest-specific indices, and original spectral bands.
To optimize model performance, a two-step Feature Selection process was implemented.
Initially, Pearson’s correlation coefficient analysis was used to eliminate highly redundant variables (excluding those with r > 0.
90 or r < -0.
90).
Subsequently, the remaining variables were ranked based on the Mean Decrease in Gini Impurity metric using the Random Forest algorithm.
The classification model targeted three distinct classes: deciduous forest, evergreen forest, and non-forest.
Reference data consisted of 750 standard ground-truth points, collected via stratified random sampling to ensure spatial independence.
These points were randomly partitioned into an 80% training set (600 points) and a 20% testing set (150 points).
The RF classifier was parameterized with 500 decision trees (ntrees) to maximize stability.
Results: The feature selection process successfully refined the dataset to an optimal subset of 12 non-redundant variables.
The analysis revealed that topographic features were the most influential factors governing the model's predictive capability.
Specifically, elevation was dominance the ranking with the highest importance score (629.
27), followed by slope (492.
16).
Among the spectral predictors, the shadow index (SI) (215.
45) and the Green Normalized Difference Vegetation Index (GNDVI) (184.
90) proved to be the most critical variables, as they effectively captured the complexities of canopy structures and mitigated topographic shadow effects during the dry season.
The developed RF model demonstrated exceptional performance, achieving a high Out-of-Bag (OOB) accuracy of 90.
30%.
When evaluated against the independent testing set, the model yielded an Overall Accuracy of 95.
92% and a Kappa coefficient of 0.
94.
Class-specific performance analysis indicated that the non-forest class achieved the highest accuracy (Producer’s Accuracy 97.
92%, User’s Accuracy 100%), followed by evergreen forest (Producer’s Accuracy 94.
12%, User’s Accuracy 96.
00%), and deciduous forest (Producer’s Accuracy 95.
83%, User’s Accuracy 92.
00%).
The final spatial map revealed that deciduous forests cover approximately 48.
70% (10,779.
68 km2) of the province, predominantly distributed in foothills and mid-elevation zones, while evergreen forests account for 32.
37% (7,164.
34 km2), densely dominating the higher mountain ranges, which perfectly aligns with highland forest ecology principles.
However, spatial verification identified specific limitations.
Minor misclassifications were observed within ecological transition zones (ecotones) at elevations between 800 and 1,150 meters due to highly mixed forest structures.
Additionally, spectral confusion occurred in agricultural areas containing perennial, long-living fruit orchards (e.
g.
, longan and orange orchards), which maintain permanent green canopies that closely resemble the spectral signatures of natural evergreen forests.
Conclusion: This study demonstrates that applying a RF algorithm with multi-temporal Sentinel-2 imagery on a cloud computing platform provides a highly robust, cutting-edge tool for mapping complex forest ecosystems in mountainous regions.
Empirical findings confirm that elevation serves as the most important ecological boundary for forest classification in northern Thailand.
The resulting high-precision classification map serves as a vital spatial database for accurately delineating wildfire risk zones and supporting targeted natural resource management.
For further enhance model accuracy in future research, it is highly recommended to refine the non-forest class by explicitly separating perennial fruit orchards into distinct sub-categories.
Furthermore, incorporating Synthetic Aperture Radar (Sentinel-1 SAR) data or employing advanced texture analysis should be strongly considered to improve the differentiation of physical vegetation structures across complex landscapes.
Related Results
Evaluating Forest Cover Changes in Protected Areas Using Geospatial Analysis in Chiang Mai, Thailand
Evaluating Forest Cover Changes in Protected Areas Using Geospatial Analysis in Chiang Mai, Thailand
Abstract Protected areas have been developed in Thailand to conserve the natural environment required for wildlife and human beings, and to prevent disasters. It is necessary to ad...
Biodiversity potential and scientific basis for conservation in the Song Hinh - Tay Hoa area, Dak Lak province, Vietnam
Biodiversity potential and scientific basis for conservation in the Song Hinh - Tay Hoa area, Dak Lak province, Vietnam
The Song Hinh - Tay Hoa area harbors exceptional ecological and biodiversity values. Two characteristic forest ecosystems are represented: lowland and mid-montane evergreen tropica...
The Multi-Temporal Database of Planetary Image Data (MUTED): A Web-Tool to Support Surface Change Analyses on Mars, Moon, and Mercury
The Multi-Temporal Database of Planetary Image Data (MUTED): A Web-Tool to Support Surface Change Analyses on Mars, Moon, and Mercury
<p><strong>Introduction:</strong></p>
<p>The Multi-Temporal Database of Planetary Image Data (MUTED) is a comp...
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Summary: The depth ictal electroencephalographic (EEG) propagation sequence accompanying 78 complex partial seizures of mesial temporal origin was reviewed in 24 patients (15 from...
Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Background and Objectives: To establish the land regulation, it is necessary to know basic information of the surrounding community’s land use and to be aware of basic forest laws....
Assessment of Invasive Species Severity along the Nature Trail at the Doi Chiang Dao Biosphere Reserve, Chiang Mai Province
Assessment of Invasive Species Severity along the Nature Trail at the Doi Chiang Dao Biosphere Reserve, Chiang Mai Province
Background and Objectives: Doi Chiang Dao Biosphere Reserve has been officially declared as the fifth Biosphere Reserve of Thailand, representing a critically important terrestrial...
Chiang Kai-shek
Chiang Kai-shek
Chiang Kai-shek (Jiang Jieshi 蔣介石)—also referred to as Chiang Chung-cheng (Jiang Zhongzheng 蔣中正)—is one of the most controversial figures in modern Chinese history. He is also one ...
Performance Evaluation of Mangrove Species Classification Based on Multi-Source Remote Sensing Data Using Extremely Randomized Trees in Fucheng Town, Leizhou City, Guangdong Province
Performance Evaluation of Mangrove Species Classification Based on Multi-Source Remote Sensing Data Using Extremely Randomized Trees in Fucheng Town, Leizhou City, Guangdong Province
Mangroves are an important source of blue carbon that grow in coastal areas. The study of mangrove species distribution is the basis of carbon storage research. In this study, we e...

