Javascript must be enabled to continue!
The accuracy of several multiple sequence alignment programs for proteins
View through CrossRef
Abstract
Background
There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.
Results
We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases.
Conclusion
Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
Springer Science and Business Media LLC
Title: The accuracy of several multiple sequence alignment programs for proteins
Description:
Abstract
Background
There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences.
The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs.
Results
We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios.
We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested.
We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect.
We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases.
Conclusion
Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment.
Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets.
Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
Related Results
Multiple sequence alignment accuracy and evolutionary distance estimation
Multiple sequence alignment accuracy and evolutionary distance estimation
Abstract
Background
Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence a...
Influence of alignment uncertainty on homology and phylogenetic modeling
Influence of alignment uncertainty on homology and phylogenetic modeling
Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to est...
Ancestral sequence alignment under optimal conditions
Ancestral sequence alignment under optimal conditions
Abstract
Background
Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment app...
Sequence-based Prediction of Metamorphic Behavior in Proteins
Sequence-based Prediction of Metamorphic Behavior in Proteins
Abstract
An increasing number of proteins have been demonstrated in recent years to adopt multiple three-dimensional folds with different functio...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...
The Women Who Don’t Get Counted
The Women Who Don’t Get Counted
Photo by Hédi Benyounes on Unsplash
ABSTRACT
The current incarceration facilities for the growing number of women are depriving expecting mothers of adequate care cruci...
Section-level genome sequencing and comparative genomics of Aspergillus sections Cavernicolus and Usti
Section-level genome sequencing and comparative genomics of Aspergillus sections Cavernicolus and Usti
Fig. S1. A cladogram representation of the phylogenetic relations between the species in this paper. The red labels show bootstrap values of 100 % and the black labels show bootstr...
Magnetic alignment technology for wafer bonding
Magnetic alignment technology for wafer bonding
Purpose
Wafer bonding is a key process for 3 D advanced packaging of integrated circuits. It requires very high accuracy for the wafer alignment. To solve the problems of large mov...

