Javascript must be enabled to continue!
Umap and Bismap: quantifying genome and methylome mappability
View through CrossRef
Abstract
Motivation
Short-read sequencing enables assessment of genetic and biochemical traits of individual genomic regions, such as the location of genetic variation, protein binding, and chemical modifications. Every region in a genome assembly has a property called
mappability
which measures the extent to which it can be uniquely mapped by sequence reads. In regions of lower mappability, estimates of genomic and epigenomic characteristics from sequencing assays are less reliable. At best, sequencing assays will produce misleadingly low numbers of reads in these regions. At worst, these regions have increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Bisulfite sequencing approaches used to identify DNA methylation exacerbate these problems by introducing large numbers of reads that map to multiple regions. While many tools consider mappability during the read mapping process, subsequent analysis often loses this information. Both to correct assumptions of uniformity in downstream analysis, and to identify regions where the analysis is less reliable, it is necessary to know the mappability of both ordinary and bisulfite-converted genomes.
Results
We introduce the Umap software for identifying uniquely mappable regions of any genome. Its Bismap extension identifies mappability of the bisulfite-converted genome. With a read length of 24 bp, 18.7% of the unmodified genome and 33.5% of the bisulfite-converted genome is not uniquely mappable. This complicates interpretation of functional genomics experiments using short-read sequencing, especially in regulatory regions. For example, 81% of human CpG islands overlap with regions that are not uniquely mappable. Similarly, in some ENCODE ChIP-seq datasets, up to 50% of peaks overlap with regions that are not uniquely mappable. We also explored differentially methylated regions from a case-control study and identified regions that were not uniquely mappable. In the widely used 450K methylation array, 4,230 probes are not uniquely mappable. Genome mappability is higher with longer sequencing reads, but most publicly available ChIP-seq and reduced representation bisulfite sequencing datasets have shorter reads. Therefore, uneven and low mappability remains a concern in a majority of existing data.
Availability
A Umap and Bismap track hub for human genome assemblies GRCh37/hg19 and GRCh38/hg38, and mouse assemblies GRCm37/mm9 and GRCm38/mm10 is available at
http://bismap.hoffmanlab.org
for use with the UCSC and Ensembl genome browsers. We have deposited in Zenodo the current version of our software (
https://doi.org/10.5281/zenodo.800648
) and the mappability data used in this project (
https://doi.org/10.5281/zenodo.800645
). In addition, the software (
https://bitbucket.org/hoffmanlab/umap
) is freely available under the GNU General Public License, version 3 (GPLv3).
Contact
michael.hoffman@utoronto.ca
Title: Umap and Bismap: quantifying genome and methylome mappability
Description:
Abstract
Motivation
Short-read sequencing enables assessment of genetic and biochemical traits of individual genomic regions, such as the location of genetic variation, protein binding, and chemical modifications.
Every region in a genome assembly has a property called
mappability
which measures the extent to which it can be uniquely mapped by sequence reads.
In regions of lower mappability, estimates of genomic and epigenomic characteristics from sequencing assays are less reliable.
At best, sequencing assays will produce misleadingly low numbers of reads in these regions.
At worst, these regions have increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation.
Bisulfite sequencing approaches used to identify DNA methylation exacerbate these problems by introducing large numbers of reads that map to multiple regions.
While many tools consider mappability during the read mapping process, subsequent analysis often loses this information.
Both to correct assumptions of uniformity in downstream analysis, and to identify regions where the analysis is less reliable, it is necessary to know the mappability of both ordinary and bisulfite-converted genomes.
Results
We introduce the Umap software for identifying uniquely mappable regions of any genome.
Its Bismap extension identifies mappability of the bisulfite-converted genome.
With a read length of 24 bp, 18.
7% of the unmodified genome and 33.
5% of the bisulfite-converted genome is not uniquely mappable.
This complicates interpretation of functional genomics experiments using short-read sequencing, especially in regulatory regions.
For example, 81% of human CpG islands overlap with regions that are not uniquely mappable.
Similarly, in some ENCODE ChIP-seq datasets, up to 50% of peaks overlap with regions that are not uniquely mappable.
We also explored differentially methylated regions from a case-control study and identified regions that were not uniquely mappable.
In the widely used 450K methylation array, 4,230 probes are not uniquely mappable.
Genome mappability is higher with longer sequencing reads, but most publicly available ChIP-seq and reduced representation bisulfite sequencing datasets have shorter reads.
Therefore, uneven and low mappability remains a concern in a majority of existing data.
Availability
A Umap and Bismap track hub for human genome assemblies GRCh37/hg19 and GRCh38/hg38, and mouse assemblies GRCm37/mm9 and GRCm38/mm10 is available at
http://bismap.
hoffmanlab.
org
for use with the UCSC and Ensembl genome browsers.
We have deposited in Zenodo the current version of our software (
https://doi.
org/10.
5281/zenodo.
800648
) and the mappability data used in this project (
https://doi.
org/10.
5281/zenodo.
800645
).
In addition, the software (
https://bitbucket.
org/hoffmanlab/umap
) is freely available under the GNU General Public License, version 3 (GPLv3).
Contact
michael.
hoffman@utoronto.
ca.
Related Results
GenMap: Fast and Exact Computation of Genome Mappability
GenMap: Fast and Exact Computation of Genome Mappability
Abstract
We present a fast and exact algorithm to compute the (
k, e
)-mappability. Its inverse, the (
...
Assessment of the methylome and the cognition in urban dwellers
Assessment of the methylome and the cognition in urban dwellers
IntroductionThe epigenome involving chemical modifications of DNA and chromatin that modulates gene expression in response to external and environmental conditions is characterized...
Correcting Methylation Calls in Clinically Relevant Low-Mappability Regions
Correcting Methylation Calls in Clinically Relevant Low-Mappability Regions
AbstractDNA methylation is an important component in vital biological functions such as embryonic development, carcinogenesis, and heritable regulation. Accurate methods to assess ...
Molding the Rice Methylome for Disease Resistance
Molding the Rice Methylome for Disease Resistance
Abstract
In
Arabidopsis thaliana
, epigenetic changes in the DNA methylome can prime transcriptional response...
Analyse de la méthylation de l'ADN par séquençage haut-débit chez la Poule
Analyse de la méthylation de l'ADN par séquençage haut-débit chez la Poule
Anticiper l’impact de fluctuations environnementales de nature climatique ou alimentaire est un enjeu crucial dans les systèmes de productions animales, et plus particulièrement su...
Réponse du méthylome suite à l'exposition au froid chez une espèce à génome complexe : le maïs (Zea mays ssp. mays)
Réponse du méthylome suite à l'exposition au froid chez une espèce à génome complexe : le maïs (Zea mays ssp. mays)
La caractérisation moléculaire de la réponse des plantes aux contraintes environnementales permet de mieux comprendre les bases de l’adaptation des plantes à leur milieu, et pourra...
Whole Genome Resequencing and 1000 Genomes Project
Whole Genome Resequencing and 1000 Genomes Project
Abstract
The recent advances in sequencing technologies have enabled the whole human genome to be sequenced within weeks. To date, several human...
SUN-115 Distinct DNA Methylation Signature in Neuroendocrine Tumors of Different Primary Sites and Hereditary Predisposition
SUN-115 Distinct DNA Methylation Signature in Neuroendocrine Tumors of Different Primary Sites and Hereditary Predisposition
Abstract
Objective
There is scant data of the genome-wide methylome alterations in neuroendocrine tumors (NET). Thus, the goal of this study was to co...

