Javascript must be enabled to continue!
CDS-BART: A BART-Based Foundation Model for mRNA Sequence Analysis
View through CrossRef
Abstract
Summary: Recent advancements in artificial intelligence (AI) have led to the development of foundation models that interpret mRNA as a language. Notable examples include CodonBERT, hydraRNA, EVO2, and Helix-mRNA. These models demonstrate significant potential as powerful tools for mRNA research. However, to best of our knowledge, there is currently no publicly available AI model that is both easy to use and capable of analyzing mRNA sequences up to about 4kb, a length scale typical of many therapeutic mRNAs, including those encapsulated within lipid nanoparticls (LNPs). Thus, we propose CDS-BART, a user-friendly, open-source tool that integrates SentencePiece sub-word tokenization with the denoising sequence-to-sequence training of Bidirectional and Auto-Regressive Transformers (BART). CDS-BART was pre-trained on mRNA data from nine taxonomic groups provided by the NCBI RefSeq database. This comprehensive pre-training, coupled with BART’s denoising capability, ensures effective learning of codon usage, mRNA structure, evolution, and regulation. Thus, CDS-BART can ultimately deliver robust performance across a wide range of mRNA prediction tasks.
Availability and Implementation
CDS-BART is released under the MIT License. Latest code is available via Github at
https://github.com/mogam-ai/CDS-BART
.
Title: CDS-BART: A BART-Based Foundation Model for mRNA Sequence Analysis
Description:
Abstract
Summary: Recent advancements in artificial intelligence (AI) have led to the development of foundation models that interpret mRNA as a language.
Notable examples include CodonBERT, hydraRNA, EVO2, and Helix-mRNA.
These models demonstrate significant potential as powerful tools for mRNA research.
However, to best of our knowledge, there is currently no publicly available AI model that is both easy to use and capable of analyzing mRNA sequences up to about 4kb, a length scale typical of many therapeutic mRNAs, including those encapsulated within lipid nanoparticls (LNPs).
Thus, we propose CDS-BART, a user-friendly, open-source tool that integrates SentencePiece sub-word tokenization with the denoising sequence-to-sequence training of Bidirectional and Auto-Regressive Transformers (BART).
CDS-BART was pre-trained on mRNA data from nine taxonomic groups provided by the NCBI RefSeq database.
This comprehensive pre-training, coupled with BART’s denoising capability, ensures effective learning of codon usage, mRNA structure, evolution, and regulation.
Thus, CDS-BART can ultimately deliver robust performance across a wide range of mRNA prediction tasks.
Availability and Implementation
CDS-BART is released under the MIT License.
Latest code is available via Github at
https://github.
com/mogam-ai/CDS-BART
.
Related Results
Annealing and surface treatment effect on the optical and electrical properties of n-type CdS binary compound semiconductors
Annealing and surface treatment effect on the optical and electrical properties of n-type CdS binary compound semiconductors
The preparation of CdS thin films were actualised with electrodeposition technique using cathodic voltage of 1200 milli – Volts (mV). The optical and electrical properties of three...
Tissue renin angiotensin system in IgA nephropathy
Tissue renin angiotensin system in IgA nephropathy
The inhibition of angiotensin II (AngII) by use of angiotensin converting enzyme (ACE) inhibitor or AngII receptor blocker is effective for prevention of the progression of renal d...
Impairment of HuR-Mediated FOS mRNA Stabilization in Granulocytes From Myelodysplastic Syndrome Patients.
Impairment of HuR-Mediated FOS mRNA Stabilization in Granulocytes From Myelodysplastic Syndrome Patients.
Abstract
Abstract 2805
Infection is a major cause of death in patients with myelodysplastic syndromes (MDS). Although qualitative and quantitative gra...
Managing parasitic absorption and interfacial structure in Sb2S3/CdS planar heterojunction for efficient solar cells
Managing parasitic absorption and interfacial structure in Sb2S3/CdS planar heterojunction for efficient solar cells
Cadmium sulfide (CdS) is a widely utilized electron transport material (ETM) in antimony sulfide (Sb2S3) solar cells due to its superior electron mobility and favorable band alignm...
Evaluating carbon dots as electron mediators in photochemical and photocatalytic processes of NiFe2O4
Evaluating carbon dots as electron mediators in photochemical and photocatalytic processes of NiFe2O4
Spinel ferrites such as nickel ferrite are promising energy conversion photocatalysts as they are visible-light absorbers, chemically stable, earth abundant, and inexpensive. Nicke...
Associations between the concurrent use of clinical decision support and computerized provider order entry and the rates of appropriate prescribing at discharge
Associations between the concurrent use of clinical decision support and computerized provider order entry and the rates of appropriate prescribing at discharge
SummaryIntroduction: Electronic health record systems used in conjunction with clinical decision support (CDS) or computerized provider order entry (CPOE) have shown potential in i...
Predicting Currency Prices and Informational Efficiency:
Predicting Currency Prices and Informational Efficiency:
This study examine the predictive power of Credit Default Swaps (CDS) and the equity markets on currency exchange rate to determine whether the CDS is a better predictor as compare...
Electrochemical Detection of Heavy Metal Ions using Gold Nanoparticles on Carbon Dots Extracted from Curry Leaves
Electrochemical Detection of Heavy Metal Ions using Gold Nanoparticles on Carbon Dots Extracted from Curry Leaves
Carbon dots (CDs) have attracted attention due to their versatility in electronic and optical properties based on precursor and type of synthesis process. Recently, many researcher...

