Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

CDS-BART: A BART-Based Foundation Model for mRNA Sequence Analysis

View through CrossRef
Abstract Summary: Recent advancements in artificial intelligence (AI) have led to the development of foundation models that interpret mRNA as a language. Notable examples include CodonBERT, hydraRNA, EVO2, and Helix-mRNA. These models demonstrate significant potential as powerful tools for mRNA research. However, to best of our knowledge, there is currently no publicly available AI model that is both easy to use and capable of analyzing mRNA sequences up to about 4kb, a length scale typical of many therapeutic mRNAs, including those encapsulated within lipid nanoparticls (LNPs). Thus, we propose CDS-BART, a user-friendly, open-source tool that integrates SentencePiece sub-word tokenization with the denoising sequence-to-sequence training of Bidirectional and Auto-Regressive Transformers (BART). CDS-BART was pre-trained on mRNA data from nine taxonomic groups provided by the NCBI RefSeq database. This comprehensive pre-training, coupled with BART’s denoising capability, ensures effective learning of codon usage, mRNA structure, evolution, and regulation. Thus, CDS-BART can ultimately deliver robust performance across a wide range of mRNA prediction tasks. Availability and Implementation CDS-BART is released under the MIT License. Latest code is available via Github at https://github.com/mogam-ai/CDS-BART .
Title: CDS-BART: A BART-Based Foundation Model for mRNA Sequence Analysis
Description:
Abstract Summary: Recent advancements in artificial intelligence (AI) have led to the development of foundation models that interpret mRNA as a language.
Notable examples include CodonBERT, hydraRNA, EVO2, and Helix-mRNA.
These models demonstrate significant potential as powerful tools for mRNA research.
However, to best of our knowledge, there is currently no publicly available AI model that is both easy to use and capable of analyzing mRNA sequences up to about 4kb, a length scale typical of many therapeutic mRNAs, including those encapsulated within lipid nanoparticls (LNPs).
Thus, we propose CDS-BART, a user-friendly, open-source tool that integrates SentencePiece sub-word tokenization with the denoising sequence-to-sequence training of Bidirectional and Auto-Regressive Transformers (BART).
CDS-BART was pre-trained on mRNA data from nine taxonomic groups provided by the NCBI RefSeq database.
This comprehensive pre-training, coupled with BART’s denoising capability, ensures effective learning of codon usage, mRNA structure, evolution, and regulation.
Thus, CDS-BART can ultimately deliver robust performance across a wide range of mRNA prediction tasks.
Availability and Implementation CDS-BART is released under the MIT License.
Latest code is available via Github at https://github.
com/mogam-ai/CDS-BART .

Related Results

Annealing and surface treatment effect on the optical and electrical properties of n-type CdS binary compound semiconductors
Annealing and surface treatment effect on the optical and electrical properties of n-type CdS binary compound semiconductors
The preparation of CdS thin films were actualised with electrodeposition technique using cathodic voltage of 1200 milli – Volts (mV). The optical and electrical properties of three...
Tissue renin angiotensin system in IgA nephropathy
Tissue renin angiotensin system in IgA nephropathy
The inhibition of angiotensin II (AngII) by use of angiotensin converting enzyme (ACE) inhibitor or AngII receptor blocker is effective for prevention of the progression of renal d...
Engineering FRET-based carbon dots enables simple and ultrahighly solar-conversive luminescent solar concentrators
Engineering FRET-based carbon dots enables simple and ultrahighly solar-conversive luminescent solar concentrators
Carbon dots (CDs) have been attracting a great deal of attention as an emitter embedded into the waveguide layer to fabricate the luminescent solar concentrators (LSCs). Instead of...
Synthesis of Polyaniline Supported CdS/CdS-ZnS/CdS-TiO2 Nanocomposite for Efficient Photocatalytic Applications
Synthesis of Polyaniline Supported CdS/CdS-ZnS/CdS-TiO2 Nanocomposite for Efficient Photocatalytic Applications
Photocatalytic degradation can be increased by improving photo-generated electrons and broadening the region of light absorption through conductive polymers. In that view, we have ...
Impairment of HuR-Mediated FOS mRNA Stabilization in Granulocytes From Myelodysplastic Syndrome Patients.
Impairment of HuR-Mediated FOS mRNA Stabilization in Granulocytes From Myelodysplastic Syndrome Patients.
Abstract Abstract 2805 Infection is a major cause of death in patients with myelodysplastic syndromes (MDS). Although qualitative and quantitative gra...
Placenta-derived Extracellular Vesicles in Maternal Plasma of Hb Bart’s Fetuses
Placenta-derived Extracellular Vesicles in Maternal Plasma of Hb Bart’s Fetuses
Abstract Introduction: Alpha-thalassemia is the most common cause of hydrops fetalis among Southeast Asians (also called “Bart’...
Comparative photoluminescence study of nitrogen‐doped carbon dots co‐doped with boron and sulphur
Comparative photoluminescence study of nitrogen‐doped carbon dots co‐doped with boron and sulphur
AbstractAlthough different studies in carbon dots (CDs) have been reported based on heteroatom doping, most of them have focussed on the enhancement of fluorescence properties. Her...
Managing parasitic absorption and interfacial structure in Sb2S3/CdS planar heterojunction for efficient solar cells
Managing parasitic absorption and interfacial structure in Sb2S3/CdS planar heterojunction for efficient solar cells
Cadmium sulfide (CdS) is a widely utilized electron transport material (ETM) in antimony sulfide (Sb2S3) solar cells due to its superior electron mobility and favorable band alignm...

Back to Top