Javascript must be enabled to continue!
The accuracy of several multiple sequence alignment programs for proteins
View through CrossRef
Abstract
Background
There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.
Results
We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases.
Conclusion
Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
Springer Science and Business Media LLC
Title: The accuracy of several multiple sequence alignment programs for proteins
Description:
Abstract
Background
There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences.
The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.
Results
We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios.
We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested.
We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect.
We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases.
Conclusion
Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment.
Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets.
Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
Related Results
Multiple sequence alignment accuracy and evolutionary distance estimation
Multiple sequence alignment accuracy and evolutionary distance estimation
Abstract
Background
Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence a...
Influence of alignment uncertainty on homology and phylogenetic modeling
Influence of alignment uncertainty on homology and phylogenetic modeling
Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to est...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...
The Women Who Don’t Get Counted
The Women Who Don’t Get Counted
Photo by Hédi Benyounes on Unsplash
ABSTRACT
The current incarceration facilities for the growing number of women are depriving expecting mothers of adequate care cruci...
Identification of heparin‐binding proteins in bovine seminal plasma
Identification of heparin‐binding proteins in bovine seminal plasma
AbstractA group of four similar proteins, BSP‐A1, BSP‐A2, BSP‐A3, and BSP‐30‐kDa, represent the major acidic proteins found in bovine seminal plasma (BSP). These proteins are secre...
Ontology Alignment Techniques
Ontology Alignment Techniques
Sometimes the use of a single ontology is not sufficient to cover different vocabularies for the same domain, and it becomes necessary to use several ontologies in order to encompa...
Remote homology search with hidden Potts models
Remote homology search with hidden Potts models
AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics mod...
Poster 155: The Prevalence of “Pipelining” at the Top Orthopaedic Sports Medicine Fellowship Programs
Poster 155: The Prevalence of “Pipelining” at the Top Orthopaedic Sports Medicine Fellowship Programs
Objectives: The term “pipelining” refers to the phenomenon that applicants from certain residency programs frequently match at the same fellowship programs. However, it is unclear ...

