Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Automated identification of cell-type–specific genes and alternative promoters

View through CrossRef
Abstract Background Identifying key transcriptional features, such as genes or transcripts, involved in cellular differentiation remains a challenging problem. Current methods for identifying key transcriptional features predominantly rely on pairwise comparisons among different cell types. These methods also identify long lists of differentially expressed transcriptional features. Combining the results from many such pairwise comparisons to find the transcriptional features specific only to one cell type is not straightforward. Thus, one must have a principled method for amalgamating pairwise cell type comparisons that makes full use of prior knowledge about the developmental relationships between cell types. Method We developed Cell Lineage Analysis (CLA), a computational method which identifies transcriptional features with expression patterns that discriminate cell types, incorporating Cell Ontology knowledge on the relationship between different cell types. CLA uses random forest classification with a stratified bootstrap to increase the accuracy of binary classifiers when each cell type have a different number of samples. Regularized random forest results in a classifier that selects few but important transcriptional features. For each cell type pair, CLA runs multiple instances of regularized random forest and reports the transcriptional features consistently selected. CLA not only discriminates individual cell types but can also discriminate lineages of cell types related in the developmental hierarchy. Results We applied CLA to Functional Annotation of the Mammalian Genome 5 (FANTOM5) data and identified discriminative transcription factor and long non-coding RNA(lncRNA) genes for 71 human cell types.With capped analysis of gene expression (CAGE) data, CLA identified individual cell-type–specific alternative promoters for cell surface markers. Compared to random forest with a standard bootstrap approach, CLA’s stratified bootstrap approach improved the accuracy of gene expression classification models for more than 95% of 2060 cell type pairs examined. Applied on 10X Genomics single-cell RNA-seq data for CD14 + monocytes and FCGR3A + monocytes, CLA selected only 13 discriminative genes. These genes included the top 9 out of 370 significantly differentially expressed genes obtained from conventional differential expression analysis methods. Discussion Our CLA method combines tools to simplify the interpretation of transcriptome datasets from many cell types. It automates the identification of the most differentially expressed genes for each cell type pairs CLA’s lineage score allows easy identification of the best transcriptional markers for each cell type and lineage in both bulk and single-cell transcriptomic data. Availability CLA is available at https://cla.hoffmanlab.org . We deposited the version of the CLA source with which we ran our experiments at https://doi.org/10.5281/zenodo.3630670 . We deposited other analysis code and results at https://doi.org/10.5281/zenodo.5735636 .
Title: Automated identification of cell-type–specific genes and alternative promoters
Description:
Abstract Background Identifying key transcriptional features, such as genes or transcripts, involved in cellular differentiation remains a challenging problem.
Current methods for identifying key transcriptional features predominantly rely on pairwise comparisons among different cell types.
These methods also identify long lists of differentially expressed transcriptional features.
Combining the results from many such pairwise comparisons to find the transcriptional features specific only to one cell type is not straightforward.
Thus, one must have a principled method for amalgamating pairwise cell type comparisons that makes full use of prior knowledge about the developmental relationships between cell types.
Method We developed Cell Lineage Analysis (CLA), a computational method which identifies transcriptional features with expression patterns that discriminate cell types, incorporating Cell Ontology knowledge on the relationship between different cell types.
CLA uses random forest classification with a stratified bootstrap to increase the accuracy of binary classifiers when each cell type have a different number of samples.
Regularized random forest results in a classifier that selects few but important transcriptional features.
For each cell type pair, CLA runs multiple instances of regularized random forest and reports the transcriptional features consistently selected.
CLA not only discriminates individual cell types but can also discriminate lineages of cell types related in the developmental hierarchy.
Results We applied CLA to Functional Annotation of the Mammalian Genome 5 (FANTOM5) data and identified discriminative transcription factor and long non-coding RNA(lncRNA) genes for 71 human cell types.
With capped analysis of gene expression (CAGE) data, CLA identified individual cell-type–specific alternative promoters for cell surface markers.
Compared to random forest with a standard bootstrap approach, CLA’s stratified bootstrap approach improved the accuracy of gene expression classification models for more than 95% of 2060 cell type pairs examined.
Applied on 10X Genomics single-cell RNA-seq data for CD14 + monocytes and FCGR3A + monocytes, CLA selected only 13 discriminative genes.
These genes included the top 9 out of 370 significantly differentially expressed genes obtained from conventional differential expression analysis methods.
Discussion Our CLA method combines tools to simplify the interpretation of transcriptome datasets from many cell types.
It automates the identification of the most differentially expressed genes for each cell type pairs CLA’s lineage score allows easy identification of the best transcriptional markers for each cell type and lineage in both bulk and single-cell transcriptomic data.
Availability CLA is available at https://cla.
hoffmanlab.
org .
We deposited the version of the CLA source with which we ran our experiments at https://doi.
org/10.
5281/zenodo.
3630670 .
We deposited other analysis code and results at https://doi.
org/10.
5281/zenodo.
5735636 .

Related Results

Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Objective: To determine the frequency of common chromosomal aberrations in local population idiopathic determine the frequency of common chromosomal aberrations in local population...
Cell Type-Specific Promoters of Volvox carteri for Molecular Cell Biology Studies
Cell Type-Specific Promoters of Volvox carteri for Molecular Cell Biology Studies
The multicellular green alga Volvox carteri has emerged as a valuable model organism for investigating various aspects of multicellularity and cellular differentiation, photorecept...
Plasma Cell Enumeration By Manual and Automated Methods to Establish a Standard Pictorial Reference
Plasma Cell Enumeration By Manual and Automated Methods to Establish a Standard Pictorial Reference
Background The diagnosis of plasma cell dyscrasias requires accurate, reliable enumeration of bone marrow plasma cell burden. This is typically assessed by manual...
iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
Abstract Background Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are esse...
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
AbstractBacterial promoters are usually formed by multiplecis-regulatory elements recognized by a plethora of transcriptional factors (TFs). From those, global regulators are key e...
Cytokine‐inducible promoters to drive dynamic transgene expression: The “Smart Graft” strategy
Cytokine‐inducible promoters to drive dynamic transgene expression: The “Smart Graft” strategy
AbstractBackgroundUbiquitous expression of T‐cell regulatory transgenes such as the cytotoxic T lymphocyte‐associated antigen 4 (CTLA4) or the high‐affinity variant LEA29Y improves...
Promoter architecture links gene duplication with transcriptional divergence
Promoter architecture links gene duplication with transcriptional divergence
Summary Gene duplication is thought to be a central mechanism in evolution to gain new functions, but gene families vary greatly in their rates of gene duplication ...

Back to Top