Javascript must be enabled to continue!
Can the nucleotide content of a DNA sequence predict the sequence accessibility?
View through CrossRef
Sequence accessibility is an important factor affecting gene expression. Sequence accessibility or openness impacts the likelihood that a gene is transcribed and translated into a protein and performs functions and manifests traits. The DNA, which carries the genes, is packaged as chromatin. There are two types of chromatin, heterochromatin and euchromatin. Heterochromatin tends to be inaccessible and thus is often not expressed. In contrast, euchromatin is more accessible and is expressed. Accessibility of a gene depends on the type of chromatin it is in, and with increased accessibility, there is a greater likelihood of gene transcription and expression. There are many potential factors that affect the accessibility of a gene. In this study, our hypothesis was that the content of nucleotides in a genetic sequence predicts its accessibility. Using a machine learning linear regression model, we studied the relationship between nucleotide content and accessibility. DNA sequences are made up of four nucleotides. We compared the quantity of each of these four nucleotides, adenosine, thymine, guanine, and cytosine either as single nucleotide or in specific combinations of two nucleotides with sequence accessibility using the K562 cell line. Of all the combinations tried, we discovered that the cytosine-guanine combination content had the highest positive correlation with accessibility, and therefore with gene expression. This correlation allows us to better predict which genetic sequences will be more frequently expressed based solely on the nucleotide content and sequence. Predicting gene expression through machine learning algorithms promises to catalyze our ability to understand the structure and function of specific gene sequences.
The Journal of Emerging Investigators, Inc.
Title: Can the nucleotide content of a DNA sequence predict the sequence accessibility?
Description:
Sequence accessibility is an important factor affecting gene expression.
Sequence accessibility or openness impacts the likelihood that a gene is transcribed and translated into a protein and performs functions and manifests traits.
The DNA, which carries the genes, is packaged as chromatin.
There are two types of chromatin, heterochromatin and euchromatin.
Heterochromatin tends to be inaccessible and thus is often not expressed.
In contrast, euchromatin is more accessible and is expressed.
Accessibility of a gene depends on the type of chromatin it is in, and with increased accessibility, there is a greater likelihood of gene transcription and expression.
There are many potential factors that affect the accessibility of a gene.
In this study, our hypothesis was that the content of nucleotides in a genetic sequence predicts its accessibility.
Using a machine learning linear regression model, we studied the relationship between nucleotide content and accessibility.
DNA sequences are made up of four nucleotides.
We compared the quantity of each of these four nucleotides, adenosine, thymine, guanine, and cytosine either as single nucleotide or in specific combinations of two nucleotides with sequence accessibility using the K562 cell line.
Of all the combinations tried, we discovered that the cytosine-guanine combination content had the highest positive correlation with accessibility, and therefore with gene expression.
This correlation allows us to better predict which genetic sequences will be more frequently expressed based solely on the nucleotide content and sequence.
Predicting gene expression through machine learning algorithms promises to catalyze our ability to understand the structure and function of specific gene sequences.
Related Results
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Abstract
Background: Age-associated epigenetic alteration is the underlying cause of DNA damage in aging cells. Two types of youth-associated DNA-protection epigenetic mark...
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Abstract
Introduction
Echinococcosis, caused by tapeworms of the Echinococcus genus, remains a significant zoonotic disease globally. The disease is particularly prevalent in areas...
Abstract 4679: A novel assay to predict susceptibility to tobacco-induced disease.
Abstract 4679: A novel assay to predict susceptibility to tobacco-induced disease.
Abstract
Background: Tobacco misuse is the leading preventable cause of morbidity and mortality in the world. Tobacco-induced DNA damage is one of the main mechanism...
The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome
The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome
ABSTRACT
Bacterial conjugation is a mechanism of horizontal DNA transfer. The relaxase TrwC of the conjugative plasmid R388 cleaves one strand of the transfe...
Yuk Ming Dennis Lo
Yuk Ming Dennis Lo
A propósito do artigo sobre a trissomia 21, incluído neste número da Gazeta Médica, justo é lembrar o Dr. Dennis Lo, o médico (por Oxford), investigador e professor de Patologia Qu...
DNA Veri Bankaları
DNA Veri Bankaları
Adli amaçlı DNA analizi yapan laboratuvarlarda; olay yerinden elde edilen bulgular üzerindeki DNA, şüpheli DNA’sı ile karşılaştırılarak, şüphelileri olaydan dışlama ya da dahil etm...
Cocaine-induced DNA-PK relieves RNAP II pausing by promoting TRIM28 phosphorylation
Cocaine-induced DNA-PK relieves RNAP II pausing by promoting TRIM28 phosphorylation
AbstractDrug abuse continues to pose a significant challenge in HIV control efforts. In our investigation, we discovered that cocaine not only upregulates the expression of DNA-dep...
Deciphering the code of viral-host adaptation through maximum entropy models
Deciphering the code of viral-host adaptation through maximum entropy models
AbstractUnderstanding how the genome of a virus evolves depending on the host it infects is an important question that challenges our knowledge about several mechanisms of host-pat...

