Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Applying negative rule mining to improve genome annotation

View through CrossRef
Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.
Title: Applying negative rule mining to improve genome annotation
Description:
Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates.
Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences.
We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors.
Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.
Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule.
The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.
6% of the whole body of the similarity-transferred annotation in the PEDANT genome database.
Positive rule mining does not identify two thirds of these errors.
The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.
Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.

Related Results

Predictors of False-Negative Axillary FNA Among Breast Cancer Patients: A Cross-Sectional Study
Predictors of False-Negative Axillary FNA Among Breast Cancer Patients: A Cross-Sectional Study
Abstract Introduction Fine-needle aspiration (FNA) is commonly used to investigate lymphadenopathy of suspected metastatic origin. The current study aims to find the association be...
Mining sequence annotation databanks for association patterns
Mining sequence annotation databanks for association patterns
Abstract Motivation: Millions of protein sequences currently being deposited to sequence databanks will never be annotated manually. Similarity-based annotation gene...
An International Rule of Law
An International Rule of Law
The “international rule of law” is an elusive concept. Under this heading, mainly two variations are being discussed: The international rule of law “proper” and an “internationaliz...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
The Annotation of De Novo Genome Assembly of the Silkworm, Bombyx mori Linn., Strain Nang Tui as Thai reference genome
The Annotation of De Novo Genome Assembly of the Silkworm, Bombyx mori Linn., Strain Nang Tui as Thai reference genome
Thailand is one of the most famous silk production regions, where silkworms (Bombyx mori Linn.) have been reared for sericulture for a long time. Thai silk holds the cultural impor...
QALB: Qatar Arabic language bank
QALB: Qatar Arabic language bank
Automatic text correction has been attracting research attention for English and some other western languages. Applications for automatic text correction vary from improving langua...
Bacterial genome annotation script using BLASTN v2
Bacterial genome annotation script using BLASTN v2
This protocol uses the command line tools provided by the Python package TnAtlas to identify and annotate transposon integration events in genomes. Given a set of sequencing reads...
Automated annotation in UniProt
Automated annotation in UniProt
UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally inve...

Back to Top