Javascript must be enabled to continue!
Integrating Genomic Correlation Structure Improves Copy Number Variations Detection
View through CrossRef
Abstract
Copy number variation plays important roles in human complex diseases. The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation. Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.e., boundary of CNVs). However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions such as linkage disequilibrium (LD). Our study showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome. To generate more accurate CNVs, we therefore proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic correlation (i.e., LD). To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presents high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods. We also theoretically demonstrated the correlation structure of CNV data, which further supports the necessity of integrating biological structure in statistical methods for CNV detection. This new segmentation algorithm has a wide scope of application with next-generation sequencing data analysis and single-cell sequencing analysis.
Author Summary
Copy number variants (CNVs) refers to gains or losses of the DNA segments in comparison to a reference genome. CNVs have garnered extensive interests in recent years as they play an important role susceptibility to disorders and diseases such as autism, schizophrenia and cancer [1-7]. Although innovation in modern technology is promoting the discoveries related to CNVs, the methodology for CNV detection is still lagging, which limits the novel discoveries regarding the role of CNVs in complex diseases. In this study, we are proposing a novel segmentation algorithm, LDcnv, to accurately locate the breakpoints or boundaries of CNVs in the human genome. Instead of utilizing an independent assumption of the signal intensities as has been used in traditional segmentation algorithms, LDcnv models the correlation structure in the genome in a change-point CNV detection model, which allows for accurate and fast computation with a whole genome scan. Our study showed strong theoretical evidence of the existence of correlation structure in real CNV data, and we believe that taking this evidence into consideration will improve the power of CNV detection. Extensive simulation studies have demonstrated the advantage of the LDcnv algorithm in stability, robustness and accuracy over existing methods. We also used high-quality CNV profiles to further support the superior performance of the LDcnv algorithm over existing methods. The development of the LDcnv algorithm provides great insights for new directions in developing CNV detection tools.
Title: Integrating Genomic Correlation Structure Improves Copy Number Variations Detection
Description:
Abstract
Copy number variation plays important roles in human complex diseases.
The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation.
Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.
e.
, boundary of CNVs).
However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions such as linkage disequilibrium (LD).
Our study showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome.
To generate more accurate CNVs, we therefore proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic correlation (i.
e.
, LD).
To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets.
We showed that LDcnv presents high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods.
We also theoretically demonstrated the correlation structure of CNV data, which further supports the necessity of integrating biological structure in statistical methods for CNV detection.
This new segmentation algorithm has a wide scope of application with next-generation sequencing data analysis and single-cell sequencing analysis.
Author Summary
Copy number variants (CNVs) refers to gains or losses of the DNA segments in comparison to a reference genome.
CNVs have garnered extensive interests in recent years as they play an important role susceptibility to disorders and diseases such as autism, schizophrenia and cancer [1-7].
Although innovation in modern technology is promoting the discoveries related to CNVs, the methodology for CNV detection is still lagging, which limits the novel discoveries regarding the role of CNVs in complex diseases.
In this study, we are proposing a novel segmentation algorithm, LDcnv, to accurately locate the breakpoints or boundaries of CNVs in the human genome.
Instead of utilizing an independent assumption of the signal intensities as has been used in traditional segmentation algorithms, LDcnv models the correlation structure in the genome in a change-point CNV detection model, which allows for accurate and fast computation with a whole genome scan.
Our study showed strong theoretical evidence of the existence of correlation structure in real CNV data, and we believe that taking this evidence into consideration will improve the power of CNV detection.
Extensive simulation studies have demonstrated the advantage of the LDcnv algorithm in stability, robustness and accuracy over existing methods.
We also used high-quality CNV profiles to further support the superior performance of the LDcnv algorithm over existing methods.
The development of the LDcnv algorithm provides great insights for new directions in developing CNV detection tools.
Related Results
Ahmed B. Muhammed B. Eyyûb El-Verrâk (ö. 228/843) ve El-Megâzî Nüshası
Ahmed B. Muhammed B. Eyyûb El-Verrâk (ö. 228/843) ve El-Megâzî Nüshası
Ahmad b. Muhammad b. Ayyûb al-Warrâq (d. 228/843) and His al-Maghâzî Copy The renowned Sîra scholar Ibn Ishâq has played a major role in shaping the current form of the biography o...
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Abstract
The prediction of complex or quantitative traits from single nucleotide polymorphism (SNP) genotypes has transformed livestock and plant breeding, and is also pl...
The Impact of Genomic Sequencing on Veterinary Diagnostics
The Impact of Genomic Sequencing on Veterinary Diagnostics
Genomic sequencing has revolutionized veterinary diagnostics by providing a comprehensive understanding of an animal's genetic makeup and its implications for health and disease. B...
Array‐Based Genomics in Glioma Research
Array‐Based Genomics in Glioma Research
AbstractOver the years, several relevant biomarkers with a potential clinical interest have been identified in gliomas using various techniques, such as karyotype, microsatellite a...
Accuracy of direct genomic breeding values for nationally evaluated traits in US Limousin and Simmental beef cattle
Accuracy of direct genomic breeding values for nationally evaluated traits in US Limousin and Simmental beef cattle
Abstract
Background
In national evaluations, direct genomic breeding values can be considered as correlated traits to those for which phenotypes ...
Genomic predictors of drug sensitivity in cancer: Integrating genomic data for personalized medicine in the USA
Genomic predictors of drug sensitivity in cancer: Integrating genomic data for personalized medicine in the USA
Despite applying conventional predictive methodologies to obtain genomic insights, predicting drug sensitivity for healthcare organizations in the USA remains a daunting challenge....
Deconstructing evolutionary histories of complex genomic rearrangements in lung malignancies
Deconstructing evolutionary histories of complex genomic rearrangements in lung malignancies
AbstractSomatic genomic rearrangements are hallmarks of cancer. Complex genomic rearrangements (CGRs) involving multiple intertwined structural alterations are often present in tum...
ENHANCING YIELD OF LOW COPY NUMBER PLANT BASED VECTOR FROM E.COLI CELLS FOR GENE CLONING
ENHANCING YIELD OF LOW COPY NUMBER PLANT BASED VECTOR FROM E.COLI CELLS FOR GENE CLONING
Plasmid DNA is an important role play for gene cloning and gene expression analysis in E.coli cells. But some of the plasmids are very low copy number so its isolation is also diff...

