Javascript must be enabled to continue!
Automated identification of cell-type–specific genes and alternative promoters
View through CrossRef
Abstract
Background
Identifying key transcriptional features, such as genes or transcripts, involved in cellular differentiation remains a challenging problem. Current methods for identifying key transcriptional features predominantly rely on pairwise comparisons among different cell types. These methods also identify long lists of differentially expressed transcriptional features. Combining the results from many such pairwise comparisons to find the transcriptional features specific only to one cell type is not straightforward. Thus, one must have a principled method for amalgamating pairwise cell type comparisons that makes full use of prior knowledge about the developmental relationships between cell types.
Method
We developed Cell Lineage Analysis (CLA), a computational method which identifies transcriptional features with expression patterns that discriminate cell types, incorporating Cell Ontology knowledge on the relationship between different cell types. CLA uses random forest classification with a stratified bootstrap to increase the accuracy of binary classifiers when each cell type have a different number of samples. Regularized random forest results in a classifier that selects few but important transcriptional features. For each cell type pair, CLA runs multiple instances of regularized random forest and reports the transcriptional features consistently selected. CLA not only discriminates individual cell types but can also discriminate lineages of cell types related in the developmental hierarchy.
Results
We applied CLA to Functional Annotation of the Mammalian Genome 5 (FANTOM5) data and identified discriminative transcription factor and long non-coding RNA(lncRNA) genes for 71 human cell types.With capped analysis of gene expression (CAGE) data, CLA identified individual cell-type–specific alternative promoters for cell surface markers. Compared to random forest with a standard bootstrap approach, CLA’s stratified bootstrap approach improved the accuracy of gene expression classification models for more than 95% of 2060 cell type pairs examined. Applied on 10X Genomics single-cell RNA-seq data for CD14
+
monocytes and FCGR3A
+
monocytes, CLA selected only 13 discriminative genes. These genes included the top 9 out of 370 significantly differentially expressed genes obtained from conventional differential expression analysis methods.
Discussion
Our CLA method combines tools to simplify the interpretation of transcriptome datasets from many cell types. It automates the identification of the most differentially expressed genes for each cell type pairs CLA’s lineage score allows easy identification of the best transcriptional markers for each cell type and lineage in both bulk and single-cell transcriptomic data.
Availability
CLA is available at
https://cla.hoffmanlab.org
. We deposited the version of the CLA source with which we ran our experiments at
https://doi.org/10.5281/zenodo.3630670
. We deposited other analysis code and results at
https://doi.org/10.5281/zenodo.5735636
.
Title: Automated identification of cell-type–specific genes and alternative promoters
Description:
Abstract
Background
Identifying key transcriptional features, such as genes or transcripts, involved in cellular differentiation remains a challenging problem.
Current methods for identifying key transcriptional features predominantly rely on pairwise comparisons among different cell types.
These methods also identify long lists of differentially expressed transcriptional features.
Combining the results from many such pairwise comparisons to find the transcriptional features specific only to one cell type is not straightforward.
Thus, one must have a principled method for amalgamating pairwise cell type comparisons that makes full use of prior knowledge about the developmental relationships between cell types.
Method
We developed Cell Lineage Analysis (CLA), a computational method which identifies transcriptional features with expression patterns that discriminate cell types, incorporating Cell Ontology knowledge on the relationship between different cell types.
CLA uses random forest classification with a stratified bootstrap to increase the accuracy of binary classifiers when each cell type have a different number of samples.
Regularized random forest results in a classifier that selects few but important transcriptional features.
For each cell type pair, CLA runs multiple instances of regularized random forest and reports the transcriptional features consistently selected.
CLA not only discriminates individual cell types but can also discriminate lineages of cell types related in the developmental hierarchy.
Results
We applied CLA to Functional Annotation of the Mammalian Genome 5 (FANTOM5) data and identified discriminative transcription factor and long non-coding RNA(lncRNA) genes for 71 human cell types.
With capped analysis of gene expression (CAGE) data, CLA identified individual cell-type–specific alternative promoters for cell surface markers.
Compared to random forest with a standard bootstrap approach, CLA’s stratified bootstrap approach improved the accuracy of gene expression classification models for more than 95% of 2060 cell type pairs examined.
Applied on 10X Genomics single-cell RNA-seq data for CD14
+
monocytes and FCGR3A
+
monocytes, CLA selected only 13 discriminative genes.
These genes included the top 9 out of 370 significantly differentially expressed genes obtained from conventional differential expression analysis methods.
Discussion
Our CLA method combines tools to simplify the interpretation of transcriptome datasets from many cell types.
It automates the identification of the most differentially expressed genes for each cell type pairs CLA’s lineage score allows easy identification of the best transcriptional markers for each cell type and lineage in both bulk and single-cell transcriptomic data.
Availability
CLA is available at
https://cla.
hoffmanlab.
org
.
We deposited the version of the CLA source with which we ran our experiments at
https://doi.
org/10.
5281/zenodo.
3630670
.
We deposited other analysis code and results at
https://doi.
org/10.
5281/zenodo.
5735636
.
Related Results
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Objective: To determine the frequency of common chromosomal aberrations in local population idiopathic determine the frequency of common chromosomal aberrations in local population...
Cell Type-Specific Promoters of Volvox carteri for Molecular Cell Biology Studies
Cell Type-Specific Promoters of Volvox carteri for Molecular Cell Biology Studies
The multicellular green alga Volvox carteri has emerged as a valuable model organism for investigating various aspects of multicellularity and cellular differentiation, photorecept...
Plasma Cell Enumeration By Manual and Automated Methods to Establish a Standard Pictorial Reference
Plasma Cell Enumeration By Manual and Automated Methods to Establish a Standard Pictorial Reference
Background
The diagnosis of plasma cell dyscrasias requires accurate, reliable enumeration of bone marrow plasma cell burden. This is typically assessed by manual...
iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
Abstract
Background
Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are esse...
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
AbstractBacterial promoters are usually formed by multiplecis-regulatory elements recognized by a plethora of transcriptional factors (TFs). From those, global regulators are key e...
Cytokine‐inducible promoters to drive dynamic transgene expression: The “Smart Graft” strategy
Cytokine‐inducible promoters to drive dynamic transgene expression: The “Smart Graft” strategy
AbstractBackgroundUbiquitous expression of T‐cell regulatory transgenes such as the cytotoxic T lymphocyte‐associated antigen 4 (CTLA4) or the high‐affinity variant LEA29Y improves...
Promoter architecture links gene duplication with transcriptional divergence
Promoter architecture links gene duplication with transcriptional divergence
Summary
Gene duplication is thought to be a central mechanism in evolution to gain new functions, but gene families vary greatly in their rates of gene duplication ...

