Javascript must be enabled to continue!
Computational approaches for the analysis of epigenome and transcriptome characterisation in Paramecium tetraurelia
View through CrossRef
In the last two decades, our understanding of human gene regulation has improved tremendously. There are plentiful computational methods which focus on integrative data analysis of humans, and model organisms, like mouse and drosophila. However, these tools are not directly employable by researchers working on non-model organisms to answer fundamental biological, and evolutionary questions. We aimed to develop new tools, and adapt existing software for the analysis of transcriptomic and epigenomic data of one such non-model organism, Paramecium tetraurelia, an unicellular eukaryote. Paramecium contains two diploid (2n) germline micronuclei (MIC) and a polyploid (800n) somatic macronuclei (MAC). The transcriptomic and epigenomic regulatory landscape of the MAC genome, which has 80% protein-coding genes and short intergenic regions, is poorly understood. We developed a generic automated eukaryotic short interfering RNA (siRNA) analysis tool, called RAPID. Our tool captures diverse siRNA characteristics from small RNA sequencing data and provides easily navigable visualisations. We also introduced a normalisation technique to facilitate comparison of multiple siRNA-based gene knockdown studies. Further, we developed a pipeline to characterise novel genome-wide endogenous short interfering RNAs (endo-siRNAs). In contrary to many organisms, we found that the endo-siRNAs are not acting in cis, to silence their parent mRNA. We also predicted phasing of siRNAs, which are regulated by the RNA interference (RNAi) pathway. Further, using RAPID, we investigated the aberrations of endo-siRNAs, and their respective transcriptomic alterations caused by an RNAi pathway triggered by feeding small RNAs against a target gene. We find that the small RNA transcriptome is altered, even if a gene unrelated to RNAi pathway is targeted. This is important in the context of investigations of genetically modified organisms (GMOs). We suggest that future studies need to distinguish transcriptomic changes caused by RNAi inducing techniques and actual regulatory changes. Subsequently, we adapted existing epigenomics analysis tools to conduct the first comprehensive epigenomic characterisation of nucleosome positioning and histone modifications of the Paramecium MAC. We identified well positioned nucleosomes shifted downstream of the transcription start site. GC content seems to dictate, in cis, the positioning of nucleosomes, histone marks (H3K4me3, H3K9ac, and H3K27me3), and Pol II in the AT-rich Paramecium genome. We employed a chromatin state segmentation approach, on nucleosomes and histone marks, which revealed genes with active, repressive, and bivalent chromatin states. Further, we constructed a regulatory association network of all the aforementioned data, using the sparse partial correlation network technique. Our analysis revealed subsets of genes, whose expression is positively associated with H3K27me3, different to the otherwise reported negative association with gene expression in many other organisms. Further, we developed a Random Forests classifier to predict gene expression using genic (gene length, intron frequency, etc.) and epigenetic features. Our model has a test performance (PR-AUC) of 0.83. Upon evaluating different feature sets, we found that genic features are as predictive, of gene expression, as the epigenetic features. We used Shapley local feature explanation values, to suggest that high H3K4me3, high intron frequency, low gene length, high sRNA, and high GC content are the most important elements for determining gene expression status. In this thesis, we developed novel tools, and employed several bioinformatics and machine learning methods to characterise the regulatory landscape of the Paramecium’s (epi)genome.
Title: Computational approaches for the analysis of epigenome and transcriptome characterisation in Paramecium tetraurelia
Description:
In the last two decades, our understanding of human gene regulation has improved tremendously.
There are plentiful computational methods which focus on integrative data analysis of humans, and model organisms, like mouse and drosophila.
However, these tools are not directly employable by researchers working on non-model organisms to answer fundamental biological, and evolutionary questions.
We aimed to develop new tools, and adapt existing software for the analysis of transcriptomic and epigenomic data of one such non-model organism, Paramecium tetraurelia, an unicellular eukaryote.
Paramecium contains two diploid (2n) germline micronuclei (MIC) and a polyploid (800n) somatic macronuclei (MAC).
The transcriptomic and epigenomic regulatory landscape of the MAC genome, which has 80% protein-coding genes and short intergenic regions, is poorly understood.
We developed a generic automated eukaryotic short interfering RNA (siRNA) analysis tool, called RAPID.
Our tool captures diverse siRNA characteristics from small RNA sequencing data and provides easily navigable visualisations.
We also introduced a normalisation technique to facilitate comparison of multiple siRNA-based gene knockdown studies.
Further, we developed a pipeline to characterise novel genome-wide endogenous short interfering RNAs (endo-siRNAs).
In contrary to many organisms, we found that the endo-siRNAs are not acting in cis, to silence their parent mRNA.
We also predicted phasing of siRNAs, which are regulated by the RNA interference (RNAi) pathway.
Further, using RAPID, we investigated the aberrations of endo-siRNAs, and their respective transcriptomic alterations caused by an RNAi pathway triggered by feeding small RNAs against a target gene.
We find that the small RNA transcriptome is altered, even if a gene unrelated to RNAi pathway is targeted.
This is important in the context of investigations of genetically modified organisms (GMOs).
We suggest that future studies need to distinguish transcriptomic changes caused by RNAi inducing techniques and actual regulatory changes.
Subsequently, we adapted existing epigenomics analysis tools to conduct the first comprehensive epigenomic characterisation of nucleosome positioning and histone modifications of the Paramecium MAC.
We identified well positioned nucleosomes shifted downstream of the transcription start site.
GC content seems to dictate, in cis, the positioning of nucleosomes, histone marks (H3K4me3, H3K9ac, and H3K27me3), and Pol II in the AT-rich Paramecium genome.
We employed a chromatin state segmentation approach, on nucleosomes and histone marks, which revealed genes with active, repressive, and bivalent chromatin states.
Further, we constructed a regulatory association network of all the aforementioned data, using the sparse partial correlation network technique.
Our analysis revealed subsets of genes, whose expression is positively associated with H3K27me3, different to the otherwise reported negative association with gene expression in many other organisms.
Further, we developed a Random Forests classifier to predict gene expression using genic (gene length, intron frequency, etc.
) and epigenetic features.
Our model has a test performance (PR-AUC) of 0.
83.
Upon evaluating different feature sets, we found that genic features are as predictive, of gene expression, as the epigenetic features.
We used Shapley local feature explanation values, to suggest that high H3K4me3, high intron frequency, low gene length, high sRNA, and high GC content are the most important elements for determining gene expression status.
In this thesis, we developed novel tools, and employed several bioinformatics and machine learning methods to characterise the regulatory landscape of the Paramecium’s (epi)genome.
Related Results
A viral guide RNA delivery system for CRISPR-based transcriptional activation and heritable targeted DNA demethylation in
Arabidopsis thaliana
A viral guide RNA delivery system for CRISPR-based transcriptional activation and heritable targeted DNA demethylation in
Arabidopsis thaliana
Abstract
Plant RNA viruses are used as delivery vectors for their high level of accumulation and efficient spread during virus multiplication and...
GENETIC ANALYSIS OF MATING-TYPE DIFFERENTIATION IN PARAMECIUM TETRAURELIA
GENETIC ANALYSIS OF MATING-TYPE DIFFERENTIATION IN PARAMECIUM TETRAURELIA
ABSTRACT
Whereas each of the two complementary mating types, O and E, of Paramecium tetraulrelia normally shows cytoplasmic inheritance, an abnormal heredity of mati...
Inactivation des centromères et élimination programmée d'ADN chez le cilié Paramecium tetraurelia
Inactivation des centromères et élimination programmée d'ADN chez le cilié Paramecium tetraurelia
Chez le cilié Paramecium tetraurelia, la différentiation du génome somatique à partir du génome germinal est caractérisée par la délétion massive et reproductible d'éléments transp...
Can the epigenome contribute to risk stratification for cancer onset?
Can the epigenome contribute to risk stratification for cancer onset?
Abstract
The increasing burden of cancer requires identifying and protecting individuals at highest risk. The epigenome provides an indispensable complement to genet...
Ca-Induced K+-Outward Current in Paramecium Tetraurelia
Ca-Induced K+-Outward Current in Paramecium Tetraurelia
ABSTRACT
Late K-outward currents upon membrane depolarization were recorded in Paramecium tetraurelia under a voltage clamp. A Ca-induced K-outward component is demo...
Joint single-cell multiomic analysis in Wnt3a induced asymmetric stem cell division
Joint single-cell multiomic analysis in Wnt3a induced asymmetric stem cell division
AbstractWnt signaling usually functions through a spatial gradient. Localized Wnt3a signaling can induce the asymmetric division of mouse embryonic stem cells, where proximal daugh...
A Ca-Induced Na-Current In Paramecium
A Ca-Induced Na-Current In Paramecium
ABSTRACT
Under a voltage clamp, step depolarization and repolarization can induce a sustained inward current and a tail inward current in Paramecium tetraurelia bath...
The utility of transcriptomics in the conservation of sensitive and economically important species
The utility of transcriptomics in the conservation of sensitive and economically important species
The connection between the central dogma of biology [DNA --(Transcription)---› RNA –(Translation)--› Protein] and the 'omics' resources obtained from each molecule are now being ex...

