Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

The metric space of proteins—comparative study of clustering algorithms

View through CrossRef
Abstract Motivation: A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. Results: We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation. Availability: The outcome of these investigations can be viewed in an interactive Web site at http://www.protonet.cs.huji.ac.il Supplementary information: Biological examples for comparing the performance of the different algorithms used for classification are presented in http://www.protonet.cs.huji.ac.il/examples.html Contact: ori@cs.huji.ac.il Keywords: protein families; protein classification; sequence alignment; clustering.
Title: The metric space of proteins—comparative study of clustering algorithms
Description:
Abstract Motivation: A large fraction of biological research concentrates on individual proteins and on small families of proteins.
One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins.
Several major projects have tackled this problem.
Such undertakings usually start with a process that clusters all known proteins or large subsets of this space.
Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation.
Results: We propose a novel technique that automatically clusters protein sequences.
We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them.
With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters.
The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity.
Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro.
Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation.
This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.
Availability: The outcome of these investigations can be viewed in an interactive Web site at http://www.
protonet.
cs.
huji.
ac.
il Supplementary information: Biological examples for comparing the performance of the different algorithms used for classification are presented in http://www.
protonet.
cs.
huji.
ac.
il/examples.
html Contact: ori@cs.
huji.
ac.
il Keywords: protein families; protein classification; sequence alignment; clustering.

Related Results

Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
A comparative study of mappings in metric space and controlled metric space
A comparative study of mappings in metric space and controlled metric space
The objective of this paper is to present a comparative study of mapping in Metric Space and Controlled Metric Space. The study provides the structure, gap analysis and application...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Clustering Analysis of Data with High Dimensionality
Clustering Analysis of Data with High Dimensionality
Clustering analysis has been widely applied in diverse fields such as data mining, access structures, knowledge discovery, software engineering, organization of information systems...
Seditious Spaces
Seditious Spaces
The title ‘Seditious Spaces’ is derived from one aspect of Britain’s colonial legacy in Malaysia (formerly Malaya): the Sedition Act 1948. While colonial rule may seem like it was ...
A COMPARATIVE ANALYSIS OF K-MEANS AND HIERARCHICAL CLUSTERING
A COMPARATIVE ANALYSIS OF K-MEANS AND HIERARCHICAL CLUSTERING
Clustering is the process of arranging comparable data elements into groups. One of the most frequent data mining analytical techniques is clustering analysis; the clustering algor...
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
To partition transaction data values, clustering algorithms are used. To analyse the relationships between transactions, similarity measures are utilized. Similarity models based o...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

Back to Top