Javascript must be enabled to continue!
Haplotype-based Parallel PBWT for Biobank Scale Data
View through CrossRef
AbstractDurbin’s positional Burrows-Wheeler transform (PBWT) enables algorithms with the optimal time complexity ofO(MN) for reporting all vs all haplotype matches in a population panel withMhaplotypes andNvariant sites. However, even this efficiency may still be too slow when the number of haplotypes reaches millions. To further reduce the run time, in this paper, a parallel version of the PBWT algorithms is introduced for all versus all haplotype matching, which is called HP-PBWT (haplotype-based parallel PBWT). HP-PBWT parallelly executes the PBWT by splitting a haplotype panel into blocks of haplotypes. HP-PBWT algorithms achieve parallelization for PBWT construction, reporting all versus all L-long matches, and reporting all versus all set-maximal matches while maintaining memory efficiency. HP-PBWT has antime complexity in PBWT construction, and antime complexity for reporting all versus all L-long matches and reporting all versus all set-maximal matches, whereTis the number of threads andc*is the maximum number of matches (of length L or maximum divergence value for L-long matches and set-maximal matches, re-spectively) per haplotype per site. HP-PBWT achieves 4-fold speed-up in UK Biobank genotyping array data with 30 threads in the IO-included benchmarks. When applying HP-PBWT to a dataset of 8 million randomized haplotypes (random binary strings of equal length) in the IO-excluded benchmarks, it can achieve a 22-fold speed-up with 60 cores on the Amazon EC2 server. With further hardware optimization, HP-PBWT is expected to handle billions of haplotypes efficiently.
Title: Haplotype-based Parallel PBWT for Biobank Scale Data
Description:
AbstractDurbin’s positional Burrows-Wheeler transform (PBWT) enables algorithms with the optimal time complexity ofO(MN) for reporting all vs all haplotype matches in a population panel withMhaplotypes andNvariant sites.
However, even this efficiency may still be too slow when the number of haplotypes reaches millions.
To further reduce the run time, in this paper, a parallel version of the PBWT algorithms is introduced for all versus all haplotype matching, which is called HP-PBWT (haplotype-based parallel PBWT).
HP-PBWT parallelly executes the PBWT by splitting a haplotype panel into blocks of haplotypes.
HP-PBWT algorithms achieve parallelization for PBWT construction, reporting all versus all L-long matches, and reporting all versus all set-maximal matches while maintaining memory efficiency.
HP-PBWT has antime complexity in PBWT construction, and antime complexity for reporting all versus all L-long matches and reporting all versus all set-maximal matches, whereTis the number of threads andc*is the maximum number of matches (of length L or maximum divergence value for L-long matches and set-maximal matches, re-spectively) per haplotype per site.
HP-PBWT achieves 4-fold speed-up in UK Biobank genotyping array data with 30 threads in the IO-included benchmarks.
When applying HP-PBWT to a dataset of 8 million randomized haplotypes (random binary strings of equal length) in the IO-excluded benchmarks, it can achieve a 22-fold speed-up with 60 cores on the Amazon EC2 server.
With further hardware optimization, HP-PBWT is expected to handle billions of haplotypes efficiently.
Related Results
Syllable-PBWT for space-efficient haplotype long-match query
Syllable-PBWT for space-efficient haplotype long-match query
AbstractThe positional Burrows-Wheeler transform (PBWT) has led to tremendous strides in haplotype matching on biobank-scale data. For genetic genealogical search, PBWT-based metho...
d-PBWT: dynamic positional Burrows-Wheeler transform
d-PBWT: dynamic positional Burrows-Wheeler transform
AbstractDurbin’s PBWT, a scalable data structure for haplotype matching, has been successfully applied to identical by descent (IBD) segment identification and genotype imputation....
Haplotype Matching with GBWT for Pangenome Graphs
Haplotype Matching with GBWT for Pangenome Graphs
Traditionally, variations from a linear reference genome were used to represent large sets of haplotypes compactly. In the linear reference genome based paradigm, the positional Bu...
Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model
Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model
AbstractThe Li & Stephens (LS) hidden Markov model (HMM) models the process of reconstructing a haplotype as a mosaic copy of haplotypes in a reference panel (haplotype threadi...
Cohort profile: A Québec-based plasma donor biobank to study COVID-19 immunity (PlasCoV)
Cohort profile: A Québec-based plasma donor biobank to study COVID-19 immunity (PlasCoV)
AbstractPurposeLong-term humoral immunity to COVID-19 is not well understood owing to the continuous emergence of new variants of concern, the evolving vaccine- and infection-induc...
mtDNA haplotype network analysis: Exploring genetic relationships and diversity in dog haplogroups
mtDNA haplotype network analysis: Exploring genetic relationships and diversity in dog haplogroups
The genetic diversity and relationships of dog haplogroups were studied by analyzing the HV1 region of mitochondrial DNA. Previous studies have found six distinct haplogroups (A, B...
A novel association of DQ alpha and DQ beta genes in the DRw10 haplotype. Determination of a DQw1 specificity by the DQ beta-chain.
A novel association of DQ alpha and DQ beta genes in the DRw10 haplotype. Determination of a DQw1 specificity by the DQ beta-chain.
Abstract
The association of the class II genes of the DRw10 haplotype from a cell line, NASC, initiated from a member of a well characterized family, was analyzed by...
The IT-Infrastructure of a Biobank for an Academic Medical Center
The IT-Infrastructure of a Biobank for an Academic Medical Center
For high quality research in biomedicine an operable biobank is essential. In order to make optimal use of the material and the huge amount of data a sustainable IT-infrastructure ...

