Javascript must be enabled to continue!
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping 
View through CrossRef
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging. Large benchmarking studies are needed to reveal strengths and limitations of commonly used methods. Existing DSM benchmarking studies usually rely on a single dataset with restricted access, leading to incomplete and potentially biased conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets. Each dataset has three target soil properties: soil organic matter (SOM) or -carbon (SOC), clay and pH, alongside a set of features. Features are dataset-specific and were derived from spectroscopy, proximal soil sensors and remote sensing. All datasets were processed into a tabular format and are “ready-to-go” for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing four learning algorithms: multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF) on their predictive power across all datasets of LimeSoDa. The results showed that no learning algorithm was generally superior. MLR and SVR proved to be better for high-dimensional spectral datasets due to better compatibility with principal components. In contrast, CatBoost and RF had considerably stronger performances for all other datasets. These benchmarking results illustrate that the performance of a method can be very context-dependent. Therefore, LimeSoDa provides a crucial data resource for improving the development and evaluation of machine learning methods in DSM and pedoemtrics.
Title: LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping 
Description:
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging.
Large benchmarking studies are needed to reveal strengths and limitations of commonly used methods.
Existing DSM benchmarking studies usually rely on a single dataset with restricted access, leading to incomplete and potentially biased conclusions.
To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa).
LimeSoDa consists of 31 field- and farm-scale datasets.
Each dataset has three target soil properties: soil organic matter (SOM) or -carbon (SOC), clay and pH, alongside a set of features.
Features are dataset-specific and were derived from spectroscopy, proximal soil sensors and remote sensing.
All datasets were processed into a tabular format and are “ready-to-go” for modeling.
We demonstrated the use of LimeSoDa for benchmarking by comparing four learning algorithms: multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF) on their predictive power across all datasets of LimeSoDa.
The results showed that no learning algorithm was generally superior.
MLR and SVR proved to be better for high-dimensional spectral datasets due to better compatibility with principal components.
In contrast, CatBoost and RF had considerably stronger performances for all other datasets.
These benchmarking results illustrate that the performance of a method can be very context-dependent.
Therefore, LimeSoDa provides a crucial data resource for improving the development and evaluation of machine learning methods in DSM and pedoemtrics.
Related Results
L᾽«unilinguisme» officiel de Constantinople byzantine (VIIe-XIIe s.)
L᾽«unilinguisme» officiel de Constantinople byzantine (VIIe-XIIe s.)
<p>Νίκος Οικονομίδης</...
Ballistic landslides on comet 67P/Churyumov–Gerasimenko
Ballistic landslides on comet 67P/Churyumov–Gerasimenko
<p><strong>Introduction:</strong></p><p>The slow ejecta (i.e., with velocity lower than escape velocity) and l...
Cometary Physics Laboratory: spectrophotometric experiments
Cometary Physics Laboratory: spectrophotometric experiments
<p><strong><span dir="ltr" role="presentation">1. Introduction</span></strong&...
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
<span style="font-size: 11pt; color: black; font-family: 'Times New Roman','serif'">ΠΗΛΙΝΑ ΙΓ&Delta...
Morphometry of an hexagonal pit crater in Pavonis Mons, Mars
Morphometry of an hexagonal pit crater in Pavonis Mons, Mars
<p><strong>Introduction:</strong></p>
<p>Pit craters are peculiar depressions found in almost every terrestria...
Case Study of Geological Risk Factors for Earthquake Hazard Mapping in the South Eastern Korea
Case Study of Geological Risk Factors for Earthquake Hazard Mapping in the South Eastern Korea
  In order to interpret geological risk assessment for Earthquake hazard by mapping work, since geotechnical...
The use of ERDDAP in a self-monitoring and nowcast hazard alerting coastal flood system
The use of ERDDAP in a self-monitoring and nowcast hazard alerting coastal flood system
<div>
<p>In the UK,&#160;&#163;150bn of assets and 4 million people are at risk from coastal flooding. With reductions in public funding...
Un manoscritto equivocato del copista santo Theophilos († 1548)
Un manoscritto equivocato del copista santo Theophilos († 1548)
<p><font size="3"><span class="A1"><span style="font-family: 'Times New Roman','serif'">ΕΝΑ ΛΑΝ&...

