Javascript must be enabled to continue!

Applying negative rule mining to improve genome annotation

Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.

Springer Science and Business Media LLC

Irena I Artamonova Goar Frishman Dmitrij Frishman

BMC Bioinformatics

2007

Title: Applying negative rule mining to improve genome annotation

Description:

Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates.

Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences.

We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors.

Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.

Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule.

The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.

6% of the whole body of the similarity-transferred annotation in the PEDANT genome database.

Positive rule mining does not identify two thirds of these errors.

The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.

Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.

Back

Abstract Introduction Fine-needle aspiration (FNA) is commonly used to investigate lymphadenopathy of suspected metastatic origin. The current study aims to find the association be...

Light at the End of the Tunnel: Mining Justice and Health

The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...

An extensible genome annotation workbench based on the Galaxy Platform

Introduction Falling costs of genetic sequencing have allowed sequencing and annotation of the genomes of non-model organism. In annotating non-mod...

Impact of Mining on Socioeconomic Status in Puno, Peru

This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...

Benchmarking Hayai-Annotation Plants: A Re-evaluation Using Standard Evaluation Metrics

Abstract The rapid growth of next-generation sequencing (NGS) technology has led to a surge in the determination of whole genome sequences in pla...

Mining sequence annotation databanks for association patterns

Abstract Motivation: Millions of protein sequences currently being deposited to sequence databanks will never be annotated manually. Similarity-based annotation gene...

Galaxy Genome Annotation: Galaxy as a platform for the annotation of genomes

Galaxy Genome Annotation (GGA) is a project focusing on developments and resources to turn Galaxy into a complete and efficient platform for the structural and functional annotatio...

Galaxy Genome Annotation: Galaxy as a platform for the annotation of genomes

Galaxy Genome Annotation (GGA) is a project focusing on developments and resources to turn Galaxy into a complete and efficient platform for the structural and functional annotatio...

Email:
Password:

Email:

Applying negative rule mining to improve genome annotation

Related Results