Javascript must be enabled to continue!
MSclassifR: an R Package for Supervised Classification of Mass Spectra with Machine Learning Methods
View through CrossRef
Abstract
MSclassifR is an R package that has been specifically designed to improve the classification of mass spectra obtained from MALDI-TOF mass spectrometry. It offers a comprehensive range of functions that are focused on processing mass spectra, identifying discriminant
m/z
values, and making accurate predictions. The package introduces innovative algorithms for selecting discriminating
m/z
values and making predictions. To assess the effectiveness of these methods, extensive tests were conducted using challenging real datasets, including bacterial subspecies of the
Mycobacterium abscessus
complex, virulent and avirulent phenotypes of
Escherichia coli
, different species of Streptococci and nasal swabs from individuals infected and uninfected with SARS-CoV-2. Additionally, multiple datasets of varying sizes were created from these real datasets to evaluate the robustness of the algorithms. The results demonstrated that the Machine Learning-based pipelines in MSclassifR achieved high levels of accuracy and Kappa values. On an in-house dataset, some pipelines even achieved more than 95% mean accuracy, whereas commercial system only achieved 62% mean accuracy. Certain methods showed greater resilience to changes in dataset sizes when constructing Machine Learning-based pipelines. These simulations also helped determine the minimum sizes of training sets required to obtain reliable results. The package is freely available online, and its open-source nature encourages collaborative development, customization, and fosters innovation within the community focused on improving diagnosis based on MALDI-TOF spectra.
Key points
MSclassifR is a comprehensive R package enabling the construction of data analysis pipelines for the precise classification of mass spectra.
Our R package contains an innovative method for variable selection from random forests, which delivered excellent results on real data.
In-depth analysis of various machine learning-based pipelines using our package allowed us to make conclusions about the optimal m/z selection and prediction methods depending on the size of the training dataset.
Using a publicly available dataset of mass spectra obtained from various MALDI-TOF instruments across different countries, MSclassifR is able to build robust pipelines capable of adapting to different instruments in an automatic way.
When tested on an in-house dataset, MSclassifR pipelines consistently outperformed a commercial software in terms of prediction accuracy.
Title: MSclassifR: an R Package for Supervised Classification of Mass Spectra with Machine Learning Methods
Description:
Abstract
MSclassifR is an R package that has been specifically designed to improve the classification of mass spectra obtained from MALDI-TOF mass spectrometry.
It offers a comprehensive range of functions that are focused on processing mass spectra, identifying discriminant
m/z
values, and making accurate predictions.
The package introduces innovative algorithms for selecting discriminating
m/z
values and making predictions.
To assess the effectiveness of these methods, extensive tests were conducted using challenging real datasets, including bacterial subspecies of the
Mycobacterium abscessus
complex, virulent and avirulent phenotypes of
Escherichia coli
, different species of Streptococci and nasal swabs from individuals infected and uninfected with SARS-CoV-2.
Additionally, multiple datasets of varying sizes were created from these real datasets to evaluate the robustness of the algorithms.
The results demonstrated that the Machine Learning-based pipelines in MSclassifR achieved high levels of accuracy and Kappa values.
On an in-house dataset, some pipelines even achieved more than 95% mean accuracy, whereas commercial system only achieved 62% mean accuracy.
Certain methods showed greater resilience to changes in dataset sizes when constructing Machine Learning-based pipelines.
These simulations also helped determine the minimum sizes of training sets required to obtain reliable results.
The package is freely available online, and its open-source nature encourages collaborative development, customization, and fosters innovation within the community focused on improving diagnosis based on MALDI-TOF spectra.
Key points
MSclassifR is a comprehensive R package enabling the construction of data analysis pipelines for the precise classification of mass spectra.
Our R package contains an innovative method for variable selection from random forests, which delivered excellent results on real data.
In-depth analysis of various machine learning-based pipelines using our package allowed us to make conclusions about the optimal m/z selection and prediction methods depending on the size of the training dataset.
Using a publicly available dataset of mass spectra obtained from various MALDI-TOF instruments across different countries, MSclassifR is able to build robust pipelines capable of adapting to different instruments in an automatic way.
When tested on an in-house dataset, MSclassifR pipelines consistently outperformed a commercial software in terms of prediction accuracy.
Related Results
The Black Mass as Play: Dennis Wheatley's The Devil Rides Out
The Black Mass as Play: Dennis Wheatley's The Devil Rides Out
Literature—at least serious literature—is something that we work at. This is especially true within the academy. Literature departments are places where workers labour over texts c...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
CLASSIFYING THE SUPERVISED MACHINE LEARNING AND COMPARING THE PERFORMANCES OF THE ALGORITHMS
CLASSIFYING THE SUPERVISED MACHINE LEARNING AND COMPARING THE PERFORMANCES OF THE ALGORITHMS
Supervised Learning (SL), also recognized as SML, means Supervised Machine Learning. Its a subclass of AI (Artificial Intelligence) and Machine Learning (ML). Its defined by the co...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Simplified access of asteroid spectral data and metadata using classy
Simplified access of asteroid spectral data and metadata using classy
Remote-sensing spectroscopy is the most efficient observational technique to characterise the surface composition of asteroids within a reasonable timeframe. While photometry allow...
A Supervised Machine Learning Algorithms: Applications, Challenges, and Recommendations
A Supervised Machine Learning Algorithms: Applications, Challenges, and Recommendations
Machine Learning (ML) is an advanced technology that empowers systems to acquire knowledge autonomously, eliminating the need for explicit programming. The fundamental objective of...
Desmoid-Type Fibromatosis of The Breast: A Case Series
Desmoid-Type Fibromatosis of The Breast: A Case Series
Abstract
IntroductionDesmoid-type fibromatosis (DTF), also called aggressive fibromatosis, is a rare, benign, locally aggressive condition. Mammary DTF originates from fibroblasts ...

