Javascript must be enabled to continue!
iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
View through CrossRef
Abstract
Background
Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec – an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets.
Results
The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency.
Conclusions
iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec.
Springer Science and Business Media LLC
Title: iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features
Description:
Abstract
Background
Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes.
Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest.
Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology.
Besides experimental techniques, computational methods have been developed to predict promoters.
In this study, we propose iPromoter-Seqvec – an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences.
The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets.
Results
The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance.
Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.
85 to 0.
99 and AUCPR values ranging from 0.
86 to 0.
99.
Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters.
With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency.
Conclusions
iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes.
Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities.
Links to our source codes and web server are available at https://github.
com/mldlproject/2022-iPromoter-Seqvec.
Related Results
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
Modulating Fis and IHF binding specificity, crosstalk and regulatory logic through the engineering of complex promoters
AbstractBacterial promoters are usually formed by multiplecis-regulatory elements recognized by a plethora of transcriptional factors (TFs). From those, global regulators are key e...
Behavioral signatures of the rapid recruitment of long-term memory to overcome working memory capacity limits
Behavioral signatures of the rapid recruitment of long-term memory to overcome working memory capacity limits
Working- and long-term memory are often studied in isolation. To better understand the specific limitations of working memory, effort is made to reduce the potential influence of l...
Emergent properties in complex synthetic bacterial promoters
Emergent properties in complex synthetic bacterial promoters
SummaryRegulation of gene expression in bacteria results from the interplay between transcriptional factors (TFs) at target promoters, and how the arrangement of binding sites dete...
Benchmarking available bacterial promoter prediction tools: potentialities and limitations
Benchmarking available bacterial promoter prediction tools: potentialities and limitations
AbstractBackgroundThe promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massive mapping of promoter eleme...
De-novo promoters emerge more readily from random DNA than from genomic DNA
De-novo promoters emerge more readily from random DNA than from genomic DNA
Abstract
Promoters are DNA sequences that help to initiate transcription. Point mutations can create de-novo promoters, which can consequently transcribe inactive genes...
Transcriptional Regulation Underlying Long-term Sensitization in Aplysia
Transcriptional Regulation Underlying Long-term Sensitization in Aplysia
The final published article is available in the Oxford Research Encyclopedia of Neuroscience: https://oxfordre.com/neuroscience/display/10.1093/acrefore/9780190264086.001.0001/acre...
Short-term and long-term test-retest reliability of memory, complexity, and randomness of EEG microstates sequence
Short-term and long-term test-retest reliability of memory, complexity, and randomness of EEG microstates sequence
Abstract
EEG microstates sequence analysis gained a lot of attention in recent years and different sequence analysis methods have been applied to study microstates sequence...
Topological Gaussian ARTs with Short-Term and Long-Term Memory for Map Building and Fuzzy Motion Planning
Topological Gaussian ARTs with Short-Term and Long-Term Memory for Map Building and Fuzzy Motion Planning
This paper proposes a cognitive architecture for building a topological map incrementally inspired by beta oscillations during place cell learning in hippocampus. The proposed arch...

