Javascript must be enabled to continue!

Features of ChIP-seq data peak calling algorithms with good operating characteristics

Author description Reuben Thomas is a Staff Research Scientist in the Bioinformatics Core at Gladstone Institutes Sean Thomas is a Staff Research Scientist in the Bioinformatics Core at Gladstone Institutes Alisha K Holloway is the Director of Bioinformatics at Phylos Biosciences, visiting scientist at Gladstone Institutes and Adjunct Assistant Professor in Biostatistics at the University of California, San Francisco. Katherine S Pollard is a Senior Investigator at Gladstone Institutes and Professor of Biostatistics at University of California, San Francisco. Key Points Peak-calling using Chip-seq data consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. Twelve features of the two sub-problems of peak-calling methods are identified. Methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes to scan the genome for potential peaks are more powerful than ones that do not. Methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in analysis of these data. Peak-calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods (GEM, MACS2, MUSIC, BCP, TM and ZINBA) that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs (bp) of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.

openRxiv

Reuben Thomas Sean Thomas Alisha K Holloway Katherine S Pollard

2016

Title: Features of ChIP-seq data peak calling algorithms with good operating characteristics

Description:

Katherine S Pollard is a Senior Investigator at Gladstone Institutes and Professor of Biostatistics at University of California, San Francisco.

Key Points Peak-calling using Chip-seq data consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance.

Twelve features of the two sub-problems of peak-calling methods are identified.

Methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not.

Methods that use windows of different sizes to scan the genome for potential peaks are more powerful than ones that do not.

Methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test.

Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones.

Peak calling is one of the first steps in analysis of these data.

Peak-calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance.

We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other.

We picked six methods (GEM, MACS2, MUSIC, BCP, TM and ZINBA) that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than others.

We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not.

Methods that use windows of different sizes are more powerful than ones that do not.

For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test.

BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data.

GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs (bp) of a motif.

BCP and MUSIC perform best on histone data.

These findings provide guidance and rationale for selecting the best peak caller for a given application.

Back

AbstractMotivation: ChIP-chip and ChIP-seq technologies provide genome-wide measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples colle...

Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting

Abstract Introduction Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...

The impact of perceived calling on work outcomes in a nursing context: The role of career commitment and living one’s calling

AbstractThe current study examined the impact of perceived calling on nurses’ organizational commitment, organizational citizenship behavior, workplace deviant behavior, and turnov...

Unmeasured human transcription factor ChIP-seq data shape functional genomics and demand strategic prioritization

Abstract Transcription factor (TF) chromatin immunoprecipitation followed by sequencing (ChIP-seq) is essential for identifying genome-wide TF-binding sites (TFBSs),...

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data

AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...

Verification of High Speed on Chip with VIP using System Verilog

Abstract - The exploration work is addressing verification of High speed on chips protocol; we've used the system Verilog grounded test bench structure. I developed a system Verilo...

Galaxy CLIP-Explorer: a web server for CLIP-Seq data analysis

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the...

Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution. Among its many applications, i...

Email:
Password:

Email:

Features of ChIP-seq data peak calling algorithms with good operating characteristics

Related Results