Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

SysML: adaptive recommendation system for heterogeneous biomedical data preprocessing and modeling workflows

View through CrossRef
Abstract The rapid growth of high-dimensional omics datasets in biomedical research has created an urgent need for computational frameworks that are both robust and adaptable to diverse data complexities. Although a wide range of specialized tools and algorithms are available, researchers often rely on trial-and-error approaches to select suitable analytical workflows, compromising both efficiency and reproducibility. In this study, we systematically benchmarked hundreds of algorithms-preprocessing combinations across three common biomedical data challenges, including small sample sizes, missing values, and class imbalance. Our results show that tree-based models (e.g. Gradient Boosting Decision Tree, XGBoost, and Random Forest) consistently perform well in scenarios involving small-sample and missing-data, while partial least squares discriminant analysis (PLS-DA) is more effective in addressing imbalanced classes. Unsupervised cluster methods such as K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) remain robust under moderate missingness, but their performance declines when missingness exceeds 10%. To support data-driven decision-making, we developed SysML, a web-based platform that recommends data-adaptive workflows based on dataset-specific characteristics. Validated on multiple real-world biomedical datasets, SysML demonstrated improvements in both model performance and workflow efficiency. Our findings underscore that adaptive data preprocessing, rather than algorithm choice alone, is critical for achieving reliable and reproducible machine learning applications in biomedicine.
Title: SysML: adaptive recommendation system for heterogeneous biomedical data preprocessing and modeling workflows
Description:
Abstract The rapid growth of high-dimensional omics datasets in biomedical research has created an urgent need for computational frameworks that are both robust and adaptable to diverse data complexities.
Although a wide range of specialized tools and algorithms are available, researchers often rely on trial-and-error approaches to select suitable analytical workflows, compromising both efficiency and reproducibility.
In this study, we systematically benchmarked hundreds of algorithms-preprocessing combinations across three common biomedical data challenges, including small sample sizes, missing values, and class imbalance.
Our results show that tree-based models (e.
g.
Gradient Boosting Decision Tree, XGBoost, and Random Forest) consistently perform well in scenarios involving small-sample and missing-data, while partial least squares discriminant analysis (PLS-DA) is more effective in addressing imbalanced classes.
Unsupervised cluster methods such as K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) remain robust under moderate missingness, but their performance declines when missingness exceeds 10%.
To support data-driven decision-making, we developed SysML, a web-based platform that recommends data-adaptive workflows based on dataset-specific characteristics.
Validated on multiple real-world biomedical datasets, SysML demonstrated improvements in both model performance and workflow efficiency.
Our findings underscore that adaptive data preprocessing, rather than algorithm choice alone, is critical for achieving reliable and reproducible machine learning applications in biomedicine.

Related Results

A SYSTEMATIC APPROACH TO FORMAL VERIFICATION AND VALIDATION OF EMBEDDED SYSTEMS: ENHANCING RELIABILITY AND SAFETY
A SYSTEMATIC APPROACH TO FORMAL VERIFICATION AND VALIDATION OF EMBEDDED SYSTEMS: ENHANCING RELIABILITY AND SAFETY
This article addresses the problem of model-based early design verification of systems engineering applications expressed using System Modelling Language (SysML). This thesis descr...
Models and tools for supporting sustainability assessment in Systems Engineering
Models and tools for supporting sustainability assessment in Systems Engineering
Since several years, sustainability has become a very important challenge for our societies. Our lifestyles are in the process of making our planet uninhabitable because of the var...
Combining SysML and SystemC to Simulate and Verify Complex Systems
Combining SysML and SystemC to Simulate and Verify Complex Systems
Utilisation conjointé de SysML et systemC pour simmuler et vérifier les systèmes complexes De nombreux systèmes hétérogènes sont complexes et critiques. Ces système...
Understanding Systems through Graph Theory and Dynamic Visualization
Understanding Systems through Graph Theory and Dynamic Visualization
<title>ABSTRACT</title> <p>As today’s Cyber Physical Systems (CPS) become more and more complex they provide both incredible...
SysML en action avec Cameo Systems Modeler
SysML en action avec Cameo Systems Modeler
L’ingénierie système (IS) par les modèles MBSE est actuellement en vogue dans la communauté des pratiquants de l’IS, qu'ils soient analystes, architectes, développeurs ou testeurs....
EPD Electronic Pathogen Detection v1
EPD Electronic Pathogen Detection v1
Electronic pathogen detection (EPD) is a non - invasive, rapid, affordable, point- of- care test, for Covid 19 resulting from infection with SARS-CoV-2 virus. EPD scanning techno...

Back to Top