Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Transformer Model Generated Bacteriophage Genomes are Compositionally Distinct from Natural Sequences

View through CrossRef
Novel applications of language models in genomics promise to have a large impact on the field. The megaDNA model is the first publicly available generative model for creating synthetic viral genomes. To evaluate megaDNA’s ability to recapitulate the nonrandom genome composition of viruses and assess whether synthetic genomes can be algorithmically detected, compositional metrics for 4,969 natural bacteriophage genomes and 1,002 de novo synthetic bacteriophage genomes were compared. Transformer-generated sequences had varied but realistic genome lengths and 58% were classified as viral by geNomad. However, the sequences demonstrated consistent differences in various compositional metrics when compared to natural bacteriophage genomes by rank-sum tests and principal component analysis. A simple neural network trained to detect transformer-generated sequences on global compositional metrics alone displayed a median sensitivity of 93.0% and specificity of 97.9% (n = 12 independent models). Overall, these results demonstrate that megaDNA does not yet generate bacteriophage genomes with realistic compositional biases and that genome composition is a reliable method for detecting sequences generated by this model. While the results are specific to the megaDNA model, the evaluate framework described here could be applied to any generative model for genomic sequences.
Title: Transformer Model Generated Bacteriophage Genomes are Compositionally Distinct from Natural Sequences
Description:
Novel applications of language models in genomics promise to have a large impact on the field.
The megaDNA model is the first publicly available generative model for creating synthetic viral genomes.
To evaluate megaDNA’s ability to recapitulate the nonrandom genome composition of viruses and assess whether synthetic genomes can be algorithmically detected, compositional metrics for 4,969 natural bacteriophage genomes and 1,002 de novo synthetic bacteriophage genomes were compared.
Transformer-generated sequences had varied but realistic genome lengths and 58% were classified as viral by geNomad.
However, the sequences demonstrated consistent differences in various compositional metrics when compared to natural bacteriophage genomes by rank-sum tests and principal component analysis.
A simple neural network trained to detect transformer-generated sequences on global compositional metrics alone displayed a median sensitivity of 93.
0% and specificity of 97.
9% (n = 12 independent models).
Overall, these results demonstrate that megaDNA does not yet generate bacteriophage genomes with realistic compositional biases and that genome composition is a reliable method for detecting sequences generated by this model.
While the results are specific to the megaDNA model, the evaluate framework described here could be applied to any generative model for genomic sequences.

Related Results

Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
The Bacteriophage Efficiency and Antibiotics Susceptibility against Escherichia Coli and Staphylococcus Aureus
The Bacteriophage Efficiency and Antibiotics Susceptibility against Escherichia Coli and Staphylococcus Aureus
IntroductionThis study was held on the in vitro tests for the bacteriophages and their efficiency comparing with the antibiotics susceptibility in destroying bacteria. Because the ...
High-quality functional genome annotation through an intercampus competition initiative
High-quality functional genome annotation through an intercampus competition initiative
Ensuring high-quality functional annotations in newly sequenced genomes has become a fundamental problem in next-generation sequencing genomics. This problem takes additional relev...
Statistique des comparaisons de génomes complets bactériens
Statistique des comparaisons de génomes complets bactériens
La génomique comparative est l'étude des relations structurales et fonctionnelles entre des génomes appartenant à différentes souches ou espèces. Cette discipline offre ainsi la po...
ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)
ANALISIS PENGARUH MASA OPERASIONAL TERHADAP PENURUNAN KAPASITAS TRANSFORMATOR DISTRIBUSI DI PT PLN (PERSERO)
One cause the interruption of transformer is loading that exceeds the capabilities of the transformer. The state of continuous overload will affect the age of the transformer and r...
Investigating the Impact of Insertion Sequences on the Evolution of Prokaryotic Genomes
Investigating the Impact of Insertion Sequences on the Evolution of Prokaryotic Genomes
Etude de l’Impact des séquences d’Insertion sur l’évolution des énomes Procaryotes Le nombre de génomes bactériens et archées complètement séquencés augmentant sans...

Back to Top