Javascript must be enabled to continue!

Benchmarking Statistical Multiple Sequence Alignment

Abstract The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology

openRxiv

Michael Nute Ehsan Saleh Tandy Warnow

2018

Title: Benchmarking Statistical Multiple Sequence Alignment

Description:

Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks.

We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets.

Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks.

There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations.

multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology.

Back

Related Results

An optimisational model of benchmarking

PurposeThe purpose of this paper is to develop a quantitative methodology for benchmarking process which is simple, effective and efficient as a rejoinder to benchmarking detractor...

A review on benchmarking of supply chain performance measures

PurposeThe purpose of this paper is to redress the imbalances in the past literature of supply chain benchmarking and enhance data envelopment analysis (DEA) modeling approach in s...

Multiple sequence alignment accuracy and evolutionary distance estimation

Abstract Background Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence a...

Influence of alignment uncertainty on homology and phylogenetic modeling

Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to est...

The need for adaptive processes of benchmarking in small business‐to‐business services

PurposeThis paper aims to explore current management attitudes towards benchmarking and its implementation within small business‐to‐business service firms in order to enhance a dee...

Organisational ensuring the international benchmarking of the enterprise

This paper delves into the contemporary significance of organizational facilitation for international benchmarking within enterprises. It explores strategies and methodologies, she...

The development of performance measures through an activity based benchmarking project across an international network of academic libraries

Purpose – The purpose of this paper is to outline the findings from the initial stages of an activity-based benchmarking project developed across an international n...

Figs S1-S9

Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...

Email:
Password:

Email: