Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Better Rulesets by Removing Redundant Specialisations and Generalisations in Association Rule Mining

View through CrossRef
Association rule mining is a fundamental task in many data mining and analysis applications, both for knowledge extraction and as part of other processes (for example, building associative classifiers). It is well known that the number of associations identified by many association rule mining algorithms can be so large as to present a barrier to their interpretability and practical use. A typical solution to this problem involves removing redundant rules. This paper proposes a novel definition of redundancy, which is used to identify only the most interesting associations. Compared to existing redundancy based approaches, our method is both more robust to noise, and produces fewer overall rules for a given data (improving clarity). A rule can be considered redundant if the knowledge it describes is already contained in other rules. Given an association rule, most existing approaches consider rules to be redundant if they add additional variables without increasing quality according to some measure of interestingness. We claim that complex interactions between variables can confound many interestingness measures. This can lead to existing approaches being overly aggressive in removing redundant associations. Most existing approaches also fail to take into account situations where more general rules (those with fewer attributes) can be considered redundant with respect to their specialisations. We examine this problem and provide concrete examples of such errors using artificial data. An alternate definition of redundancy that addresses these issues is proposed. Our approach is shown to identify interesting associations missed by comparable methods on multiple real and synthetic data. When combined with the removal of redundant generalisations, our approach is often able to generate smaller overall rule sets, while leaving average rule quality unaffected or slightly improved.
Title: Better Rulesets by Removing Redundant Specialisations and Generalisations in Association Rule Mining
Description:
Association rule mining is a fundamental task in many data mining and analysis applications, both for knowledge extraction and as part of other processes (for example, building associative classifiers).
It is well known that the number of associations identified by many association rule mining algorithms can be so large as to present a barrier to their interpretability and practical use.
A typical solution to this problem involves removing redundant rules.
This paper proposes a novel definition of redundancy, which is used to identify only the most interesting associations.
Compared to existing redundancy based approaches, our method is both more robust to noise, and produces fewer overall rules for a given data (improving clarity).
A rule can be considered redundant if the knowledge it describes is already contained in other rules.
Given an association rule, most existing approaches consider rules to be redundant if they add additional variables without increasing quality according to some measure of interestingness.
We claim that complex interactions between variables can confound many interestingness measures.
This can lead to existing approaches being overly aggressive in removing redundant associations.
Most existing approaches also fail to take into account situations where more general rules (those with fewer attributes) can be considered redundant with respect to their specialisations.
We examine this problem and provide concrete examples of such errors using artificial data.
An alternate definition of redundancy that addresses these issues is proposed.
Our approach is shown to identify interesting associations missed by comparable methods on multiple real and synthetic data.
When combined with the removal of redundant generalisations, our approach is often able to generate smaller overall rule sets, while leaving average rule quality unaffected or slightly improved.

Related Results

Light at the End of the Tunnel: Mining Justice and Health
Light at the End of the Tunnel: Mining Justice and Health
The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...
Impact of Mining on Socioeconomic Status in Puno, Peru
Impact of Mining on Socioeconomic Status in Puno, Peru
This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...
Data Warehousing for Association Mining
Data Warehousing for Association Mining
With the phenomenal growth of electronic data and information, there are many demands for developments of efficient and effective systems (tools) to address the issue of performing...
An International Rule of Law
An International Rule of Law
The “international rule of law” is an elusive concept. Under this heading, mainly two variations are being discussed: The international rule of law “proper” and an “internationaliz...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
The Significance of Text Mining in Research: A Comprehensive Review
The Significance of Text Mining in Research: A Comprehensive Review
Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...
Algorithms for Association Rule Mining
Algorithms for Association Rule Mining
Association Rule Mining (ARM) is one of the important data mining tasks that has been extensively researched by data-mining community and has found wide applications in industry. A...
Redundant Representations in Evolutionary Computation
Redundant Representations in Evolutionary Computation
This paper discusses how the use of redundant representations influences the performance of genetic and evolutionary algorithms. Representations are redundant if the number of geno...

Back to Top