Javascript must be enabled to continue!
Human-scATAC-Corpus: a comprehensive database of scATAC-seq data
View through CrossRef
ABSTRACT
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) profiles chromatin accessibility at cellular resolution, making it possible to reveal epigenomic landscapes that govern gene regulation in a variety of cells. Nevertheless, heterogeneous feature spaces and complex processing pipelines have impeded the construction of an ensemble resource capable of supporting diverse downstream analytical scenarios. To address this gap, we present Human-scATAC-Corpus (https://health.tsinghua.edu.cn/human-scatac-corpus/), a comprehensive database of human scATAC-seq comprising 5,407,621 cells from 35 datasets across 37 tissues or cell lines. To support complementary use cases, each dataset is distributed in three aligned formats: cell-by-candidate cis-regulatory element matrices for cross-dataset integration, raw fragment files for flexible processing, and cell-by-peak matrices for within-dataset analyses. This resource spans diverse biological contexts and includes rich metadata, enabling method benchmarking and development, as well as pretraining of foundation models. The website offers searchable browsing, detailed dataset pages, on-demand downloads, and tutorials. EpiAgent, a foundation model pretrained on Human-scATAC-Corpus, is further integrated to provide online analyses, including reference mapping, embedding extraction, and cell type annotation. Human-scATAC-Corpus establishes a unified and scalable substrate for single-cell epigenomics and is intended to accelerate discovery while standardizing evaluation across tasks.
GRAPHICAL ABSTRACT
Cold Spring Harbor Laboratory
Title: Human-scATAC-Corpus: a comprehensive database of scATAC-seq data
Description:
ABSTRACT
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) profiles chromatin accessibility at cellular resolution, making it possible to reveal epigenomic landscapes that govern gene regulation in a variety of cells.
Nevertheless, heterogeneous feature spaces and complex processing pipelines have impeded the construction of an ensemble resource capable of supporting diverse downstream analytical scenarios.
To address this gap, we present Human-scATAC-Corpus (https://health.
tsinghua.
edu.
cn/human-scatac-corpus/), a comprehensive database of human scATAC-seq comprising 5,407,621 cells from 35 datasets across 37 tissues or cell lines.
To support complementary use cases, each dataset is distributed in three aligned formats: cell-by-candidate cis-regulatory element matrices for cross-dataset integration, raw fragment files for flexible processing, and cell-by-peak matrices for within-dataset analyses.
This resource spans diverse biological contexts and includes rich metadata, enabling method benchmarking and development, as well as pretraining of foundation models.
The website offers searchable browsing, detailed dataset pages, on-demand downloads, and tutorials.
EpiAgent, a foundation model pretrained on Human-scATAC-Corpus, is further integrated to provide online analyses, including reference mapping, embedding extraction, and cell type annotation.
Human-scATAC-Corpus establishes a unified and scalable substrate for single-cell epigenomics and is intended to accelerate discovery while standardizing evaluation across tasks.
GRAPHICAL ABSTRACT.
Related Results
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...
Semi-automated IT-scATAC-seq profiles cell-specific chromatin accessibility in differentiation and peripheral blood populations
Semi-automated IT-scATAC-seq profiles cell-specific chromatin accessibility in differentiation and peripheral blood populations
Abstract
Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility but is often limited by throughput, cost, and equipment requirements...
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution. Among its many applications, i...
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract
Introduction
Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
IT-scATAC-seq v1
IT-scATAC-seq v1
Single-cell ATAC-seq (scATAC-seq) allows for detailed mapping of chromatin accessibility but often faces challenges related to throughput, cost, and equipment demands. In this stud...
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
ABSTRACTCell type composition of intact bulk tissues can vary across samples. Deciphering cell type composition and its changes during disease progression is an important step towa...

