Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Sequential Labelling and DNABERT For Splice Site Prediction in Homo Sapiens DNA

View through CrossRef
Abstract Background Gene prediction on DNA has been conducted using various deep learning architectures to discover splice sites to locate intron and exon regions. However, recent predictions are carried out with models trained with a sequence which has a splice site in the middle. This case eliminates the possibility of multiple splice sites in a single sequence. Results This research proposes a sequential labelling model to predict splice sites regardless of their position in a sequence. A sequential labelling model named DNABERT-SL is developed on pre-trained DNABERT-3. DNABERT-SL is benchmarked against the latest sequential labelling model for mutation type and location prediction based on BiLSTM and BiGRU. While achieving F1 scores above 0.8 on validation data, BiLSTM, BiGRU, and DNABERT-SL perform poorly on test data as indicated by their respective low F1 scores (0.498 ± 0.184, 0.6 ± 0.123, 0.532 ± 0.245). Conclusions DNABERT-SL model cannot distinguish nucleotides acting as splice sites from normal ones. Principal component analysis on token contextual representation produced by DNABERT-SL shows that the representation is not optimal for distinguishing splice site tokens from non-splice site tokens. Splice site motif observation conducted on test and training sequences shows that an arbitrary sequence with GT-AG motif can be both splice sites in some sequences and normal nucleotides in others.
Research Square Platform LLC
Title: Sequential Labelling and DNABERT For Splice Site Prediction in Homo Sapiens DNA
Description:
Abstract Background Gene prediction on DNA has been conducted using various deep learning architectures to discover splice sites to locate intron and exon regions.
However, recent predictions are carried out with models trained with a sequence which has a splice site in the middle.
This case eliminates the possibility of multiple splice sites in a single sequence.
Results This research proposes a sequential labelling model to predict splice sites regardless of their position in a sequence.
A sequential labelling model named DNABERT-SL is developed on pre-trained DNABERT-3.
DNABERT-SL is benchmarked against the latest sequential labelling model for mutation type and location prediction based on BiLSTM and BiGRU.
While achieving F1 scores above 0.
8 on validation data, BiLSTM, BiGRU, and DNABERT-SL perform poorly on test data as indicated by their respective low F1 scores (0.
498 ± 0.
184, 0.
6 ± 0.
123, 0.
532 ± 0.
245).
Conclusions DNABERT-SL model cannot distinguish nucleotides acting as splice sites from normal ones.
Principal component analysis on token contextual representation produced by DNABERT-SL shows that the representation is not optimal for distinguishing splice site tokens from non-splice site tokens.
Splice site motif observation conducted on test and training sequences shows that an arbitrary sequence with GT-AG motif can be both splice sites in some sequences and normal nucleotides in others.

Related Results

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
The strength of the HIV-1 3' splice sites affects Rev function
The strength of the HIV-1 3' splice sites affects Rev function
Abstract Background The HIV-1 Rev protein is a key component in the early to late switch in HIV-1 splicing from early intronless (e.g. tat, rev) ...
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Abstract Background: Age-associated epigenetic alteration is the underlying cause of DNA damage in aging cells. Two types of youth-associated DNA-protection epigenetic mark...
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Introduction: The United States currently faces two opioid crises, an evolved crisis currently manifesting as widespread abuse of illicit opioids, and a crisis in pain management l...
Splice-disrupt genomic variants in prostate cancer: Association of some splice-disrupt variants with advanced prostate cancer
Splice-disrupt genomic variants in prostate cancer: Association of some splice-disrupt variants with advanced prostate cancer
Precise pre-mRNA splicing is vital for appropriate protein translation where splice-disrupt variants may change the structure of transcripts and their encoded proteins, resulting i...
The African Middle Stone Age
The African Middle Stone Age
The Middle Stone Age (MSA) is a period of African prehistory characterized by the production of flake-based assemblages, often with a focus on stone points and blades using prepare...
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Abstract Introduction Echinococcosis, caused by tapeworms of the Echinococcus genus, remains a significant zoonotic disease globally. The disease is particularly prevalent in areas...
MEMBEDAH FENOMENA HOMO SACER PADA PROSES PENYIDIKAN
MEMBEDAH FENOMENA HOMO SACER PADA PROSES PENYIDIKAN
<p align="center"><strong><em>ABSTRAK</em></strong><em></em></p><p><em>Homo Sacer berasal dari bahasa Latin, kata homo y...

Back to Top