Javascript must be enabled to continue!
RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis
View through CrossRef
AbstractThe size of genetic databases has grown large enough such that, genetic genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. However, a fast and accurate method is needed to search an out-of-database individual against millions of individuals in such databases. Most existing approaches only offer all-vs-all within panel match. Some prototype algorithms offer 1-vs-all query from out-of-panel individual, but they do not tolerate errors. A new method, random projection-based identical-by-descent (IBD) detection (RaPID) query, referred as RaPID-Query, is introduced to make fast genealogical search possible. RaPID-Query method identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query method is able to locate IBD segments quickly with a given cutoff length while allowing mismatched sites in IBD segments. A single query against all UK biobank autosomal chromosomes can be completed within 2.76 seconds CPU time on average, with the minimum 7 cM IBD segment length and minimum 700 markers. Using the same criteria, RaPID-Query can achieve 0.099 false negative rate and 0.017 false positive rate at the same time on a chromosome 20 sequencing panel having 92,296 sites, which is comparable to the state-of-the-art IBD detection method Hap-IBD. For the relatedness degree separation experiments, RaPID-Query is able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. It is anticipated that RaPID-Query will make genealogical search convenient and effective, potentially with the integration of complex inference models.
Title: RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis
Description:
AbstractThe size of genetic databases has grown large enough such that, genetic genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects.
However, a fast and accurate method is needed to search an out-of-database individual against millions of individuals in such databases.
Most existing approaches only offer all-vs-all within panel match.
Some prototype algorithms offer 1-vs-all query from out-of-panel individual, but they do not tolerate errors.
A new method, random projection-based identical-by-descent (IBD) detection (RaPID) query, referred as RaPID-Query, is introduced to make fast genealogical search possible.
RaPID-Query method identifies IBD segments between a query haplotype and a panel of haplotypes.
By integrating matches over multiple PBWT indexes, RaPID-Query method is able to locate IBD segments quickly with a given cutoff length while allowing mismatched sites in IBD segments.
A single query against all UK biobank autosomal chromosomes can be completed within 2.
76 seconds CPU time on average, with the minimum 7 cM IBD segment length and minimum 700 markers.
Using the same criteria, RaPID-Query can achieve 0.
099 false negative rate and 0.
017 false positive rate at the same time on a chromosome 20 sequencing panel having 92,296 sites, which is comparable to the state-of-the-art IBD detection method Hap-IBD.
For the relatedness degree separation experiments, RaPID-Query is able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.
28%.
It is anticipated that RaPID-Query will make genealogical search convenient and effective, potentially with the integration of complex inference models.
Related Results
Named Entity Recognition in Statistical Dataset Search Queries
Named Entity Recognition in Statistical Dataset Search Queries
Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition ...
AYRSHIRE BREED IN THE CONDITIONS OF UKRAINE
AYRSHIRE BREED IN THE CONDITIONS OF UKRAINE
Introduction. Most of the breeds of cattle in Ukraine, especially those created in recent years, have been intensively studied for economically useful features depending on the inf...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Searching web documents using a summarization approach
Searching web documents using a summarization approach
Purpose
The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search resu...
Syllable-PBWT for space-efficient haplotype long-match query
Syllable-PBWT for space-efficient haplotype long-match query
AbstractThe positional Burrows-Wheeler transform (PBWT) has led to tremendous strides in haplotype matching on biobank-scale data. For genetic genealogical search, PBWT-based metho...
A prediction model for web search hit counts using word frequencies
A prediction model for web search hit counts using word frequencies
A search engine user with a well-defined information need is not interested in getting thousands of hits, but a few hits that are all highly relevant to their search. Often search ...
Analysis of query keywords of sports‐related queries using visualization and clustering
Analysis of query keywords of sports‐related queries using visualization and clustering
AbstractThe authors investigated 11 sports‐related query keywords extracted from a public search engine query log to better understand sports‐related information seeking on the Int...
Aggregated Search
Aggregated Search
The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interface—a single query box and a common presentation of r...

