Javascript must be enabled to continue!
MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
View through CrossRef
Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.
Title: MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem
Description:
Metabolites provide a direct functional signature of cellular state.
Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample.
Metabolite annotation is executed using tandem mass spectrometry.
Spectral library search is far from comprehensive, and numerous compounds remain unannotated.
So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases.
Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for.
Here, we present a novel computational method called Mad Hatter for this task.
Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore.
Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’.
We then show that Mad Hatter reaches a stunning 97.
6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases.
Unfortunately, Mad Hatter is not a real method.
Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation.
We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments.
This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.
Related Results
Mad Hatter correctly annotates 98% of small molecule tandem mass spectra searching in PubChem
Mad Hatter correctly annotates 98% of small molecule tandem mass spectra searching in PubChem
AbstractMetabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousand...
Mass spectrometry of oligosaccharides
Mass spectrometry of oligosaccharides
Abstract
I.
Introduction
162
II.
CHARACTERISTICS OF TANDEM MASS SPECTRA OF CARBOHYDRATES
163
A. Ionization of Carbohydrates
163
1. Electrospray Ionization (E...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Desmoid-Type Fibromatosis of The Breast: A Case Series
Desmoid-Type Fibromatosis of The Breast: A Case Series
Abstract
IntroductionDesmoid-type fibromatosis (DTF), also called aggressive fibromatosis, is a rare, benign, locally aggressive condition. Mammary DTF originates from fibroblasts ...
Simplified access of asteroid spectral data and metadata using classy
Simplified access of asteroid spectral data and metadata using classy
Remote-sensing spectroscopy is the most efficient observational technique to characterise the surface composition of asteroids within a reasonable timeframe. While photometry allow...
PubChem and ChEMBL Beyond Lipinski
PubChem and ChEMBL Beyond Lipinski
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also fo...
PubChem and ChEMBL Beyond Lipinski
PubChem and ChEMBL Beyond Lipinski
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also fo...
Listlessness in the Archive
Listlessness in the Archive
1. Make a list of things to do2. Copy list of things left undone from previous list3. Add items to list of new things needing to be done4. Add some of the things already done from ...

