Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Integrating Genomic Correlation Structure Improves Copy Number Variations Detection

View through CrossRef
Abstract Copy number variation plays important roles in human complex diseases. The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation. Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.e., boundary of CNVs). However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions such as linkage disequilibrium (LD). Our study showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome. To generate more accurate CNVs, we therefore proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic correlation (i.e., LD). To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presents high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods. We also theoretically demonstrated the correlation structure of CNV data, which further supports the necessity of integrating biological structure in statistical methods for CNV detection. This new segmentation algorithm has a wide scope of application with next-generation sequencing data analysis and single-cell sequencing analysis. Author Summary Copy number variants (CNVs) refers to gains or losses of the DNA segments in comparison to a reference genome. CNVs have garnered extensive interests in recent years as they play an important role susceptibility to disorders and diseases such as autism, schizophrenia and cancer [1-7]. Although innovation in modern technology is promoting the discoveries related to CNVs, the methodology for CNV detection is still lagging, which limits the novel discoveries regarding the role of CNVs in complex diseases. In this study, we are proposing a novel segmentation algorithm, LDcnv, to accurately locate the breakpoints or boundaries of CNVs in the human genome. Instead of utilizing an independent assumption of the signal intensities as has been used in traditional segmentation algorithms, LDcnv models the correlation structure in the genome in a change-point CNV detection model, which allows for accurate and fast computation with a whole genome scan. Our study showed strong theoretical evidence of the existence of correlation structure in real CNV data, and we believe that taking this evidence into consideration will improve the power of CNV detection. Extensive simulation studies have demonstrated the advantage of the LDcnv algorithm in stability, robustness and accuracy over existing methods. We also used high-quality CNV profiles to further support the superior performance of the LDcnv algorithm over existing methods. The development of the LDcnv algorithm provides great insights for new directions in developing CNV detection tools.
Title: Integrating Genomic Correlation Structure Improves Copy Number Variations Detection
Description:
Abstract Copy number variation plays important roles in human complex diseases.
The detection of copy number variants (CNVs) is identifying mean shift in genetic intensities to locate chromosomal breakpoints, the step of which is referred to as chromosomal segmentation.
Many segmentation algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint (i.
e.
, boundary of CNVs).
However, this assumption is violated in the genetics perspective due to the existence of correlation among genomic positions such as linkage disequilibrium (LD).
Our study showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome.
To generate more accurate CNVs, we therefore proposed a novel algorithm, LDcnv, that models the CNV data with its biological characteristics relating to genetic correlation (i.
e.
, LD).
To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets.
We showed that LDcnv presents high accuracy, stability and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods.
We also theoretically demonstrated the correlation structure of CNV data, which further supports the necessity of integrating biological structure in statistical methods for CNV detection.
This new segmentation algorithm has a wide scope of application with next-generation sequencing data analysis and single-cell sequencing analysis.
Author Summary Copy number variants (CNVs) refers to gains or losses of the DNA segments in comparison to a reference genome.
CNVs have garnered extensive interests in recent years as they play an important role susceptibility to disorders and diseases such as autism, schizophrenia and cancer [1-7].
Although innovation in modern technology is promoting the discoveries related to CNVs, the methodology for CNV detection is still lagging, which limits the novel discoveries regarding the role of CNVs in complex diseases.
In this study, we are proposing a novel segmentation algorithm, LDcnv, to accurately locate the breakpoints or boundaries of CNVs in the human genome.
Instead of utilizing an independent assumption of the signal intensities as has been used in traditional segmentation algorithms, LDcnv models the correlation structure in the genome in a change-point CNV detection model, which allows for accurate and fast computation with a whole genome scan.
Our study showed strong theoretical evidence of the existence of correlation structure in real CNV data, and we believe that taking this evidence into consideration will improve the power of CNV detection.
Extensive simulation studies have demonstrated the advantage of the LDcnv algorithm in stability, robustness and accuracy over existing methods.
We also used high-quality CNV profiles to further support the superior performance of the LDcnv algorithm over existing methods.
The development of the LDcnv algorithm provides great insights for new directions in developing CNV detection tools.

Related Results

Abstract 1698: Copy number diversity within and across tumor types
Abstract 1698: Copy number diversity within and across tumor types
Abstract Introduction Cancers commonly accrue copy number gains and losses during their development. An improved understanding of their contribution to tumorigenesis...
Ahmed B. Muhammed B. Eyyûb El-Verrâk (ö. 228/843) ve El-Megâzî Nüshası
Ahmed B. Muhammed B. Eyyûb El-Verrâk (ö. 228/843) ve El-Megâzî Nüshası
Ahmad b. Muhammad b. Ayyûb al-Warrâq (d. 228/843) and His al-Maghâzî Copy The renowned Sîra scholar Ibn Ishâq has played a major role in shaping the current form of the biography o...
Genomic analysis of spinal meningiomas: correlation with histopathological grade
Genomic analysis of spinal meningiomas: correlation with histopathological grade
OBJECTIVE Spinal meningiomas are one of the most common primary intradural tumors of the adult spine. Spinal meningiomas typically have a benign course with low rates of recurrence...
Genomic selection and its importance in animal breeding and genetic improvement revolution: A comprehensive review
Genomic selection and its importance in animal breeding and genetic improvement revolution: A comprehensive review
Genomic selection has emerged as a transformative approach in animal breeding and genetic improvement, revolutionizing the field by enhancing the accuracy and efficiency of selecti...
Abstract 2595: Screening for genomic rearrangements in BRCA1 and BRCA2 genes in Algerian breast/ovarian cancer families
Abstract 2595: Screening for genomic rearrangements in BRCA1 and BRCA2 genes in Algerian breast/ovarian cancer families
Abstract Background: Breast cancer is the leading cause of cancer death in women in Algeria. To date, few molecular genetics studies of BRCA1 and BRCA2 germline muta...
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Abstract The prediction of complex or quantitative traits from single nucleotide polymorphism (SNP) genotypes has transformed livestock and plant breeding, and is...

Back to Top