Javascript must be enabled to continue!
Novel binning-based methods for model fitting and data splitting improved machine learning imbalanced data
View through CrossRef
Abstract
Machine Learning (ML) models may perform inconsistently on individual classes on nominal outputs or ranges on continuous outputs, collectively referred to here as bins. Models should be assessed through metrics that consider each bin individually, called bin metrics. Inconsistent model performance is often due to model fitting with imbalanced data. Towards improving modelling of imbalanced data, novel model fitting methods are proposed including using bin metrics as loss functions and the use of Epoch sampling. Imbalanced data also poses a challenge for appropriate data splitting. Akin split is a novel method proposed that objectively yields the most appropriate data split(s).
Existing and novel model fitting methods were used to fit models, and the models were assessed by a bin metric in in two case studies. The first case study used synthetically generated datasets with different levels of noise and imbalance. On datasets with noise and greater levels of imbalance, Epoch sampling significantly improved the model performance by up to 23.6% while significantly using less resources (computation and time) by up to 57.7% compared to a standard model fitting method. The second case study used protein-genome interactions data that are often severely right-skewed. Akin split was used to split the data more appropriately than traditional methods. Model fitting methods were tried on two model configurations. The effects of the model fitting methods varied by the model configuration, but all models were significantly improved by up to 57.7% compared to the standard model fitting.
Title: Novel binning-based methods for model fitting and data splitting improved machine learning imbalanced data
Description:
Abstract
Machine Learning (ML) models may perform inconsistently on individual classes on nominal outputs or ranges on continuous outputs, collectively referred to here as bins.
Models should be assessed through metrics that consider each bin individually, called bin metrics.
Inconsistent model performance is often due to model fitting with imbalanced data.
Towards improving modelling of imbalanced data, novel model fitting methods are proposed including using bin metrics as loss functions and the use of Epoch sampling.
Imbalanced data also poses a challenge for appropriate data splitting.
Akin split is a novel method proposed that objectively yields the most appropriate data split(s).
Existing and novel model fitting methods were used to fit models, and the models were assessed by a bin metric in in two case studies.
The first case study used synthetically generated datasets with different levels of noise and imbalance.
On datasets with noise and greater levels of imbalance, Epoch sampling significantly improved the model performance by up to 23.
6% while significantly using less resources (computation and time) by up to 57.
7% compared to a standard model fitting method.
The second case study used protein-genome interactions data that are often severely right-skewed.
Akin split was used to split the data more appropriately than traditional methods.
Model fitting methods were tried on two model configurations.
The effects of the model fitting methods varied by the model configuration, but all models were significantly improved by up to 57.
7% compared to the standard model fitting.
Related Results
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
Abstract
Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organis...
Evaluation of metagenome binning: advances and challenges
Evaluation of metagenome binning: advances and challenges
Abstract
Several recent deep learning methods for metagenome binning claim improvements in the recovery of high-quality metagenome-assembled genomes. These method...
Evaluation of Metagenome Binning: Advances and Challenges
Evaluation of Metagenome Binning: Advances and Challenges
Abstract
Background
Several recent deep learning methods for metagenome binning claim improvements in the recovery of high qual...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
CoCoBin: Graph-Based Metagenomic Binning via Composition–Coverage Separation
CoCoBin: Graph-Based Metagenomic Binning via Composition–Coverage Separation
Abstract
Motivation
Metagenomic binning is a critical step in metagenomic analysis, aiming to cluster contigs from the same genome into c...
Effect of data binning and frame averaging for micro-CT image acquisition on the morphometric outcome of bone repair assessment
Effect of data binning and frame averaging for micro-CT image acquisition on the morphometric outcome of bone repair assessment
AbstractDespite the current advances in micro-CT analysis, the influence of some image acquisition parameters on the morphometric assessment outcome have not been fully elucidated....
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
We previously reported MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated micro...

