Javascript must be enabled to continue!
Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
View through CrossRef
Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.
Title: Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
Description:
Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research.
The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items.
Apriori-based and FPTree-based algorithms have dominated the space thus far.
Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process.
To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm.
The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers.
Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time.
The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items.
Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches.
When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree.
This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment.
In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth.
Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.
Related Results
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
FREQUENT PATTERN UNTUK KATALOG DIGITAL
FREQUENT PATTERN UNTUK KATALOG DIGITAL
Seringnya suatu produk dibeli secara bersamaan dengan produk yang lain, hal tersebut merupakan suatu informasi pola keterkaitan antar produk. Pola keterkaitan yang sering terjadi p...
Fundamentals of association rules in data mining and knowledge discovery
Fundamentals of association rules in data mining and knowledge discovery
AbstractAssociation rule mining is one of the fundamental research topics in data mining and knowledge discovery that identifies interesting relationships between itemsets in datas...
Explainable cohort discoveries driven by exploratory data mining and efficient risk pattern detection
Explainable cohort discoveries driven by exploratory data mining and efficient risk pattern detection
[EMBARGOED UNTIL 6/1/2023] Finding small homogeneous subgroup cohorts in a large heterogeneous population is a critical process for hypothesis development within a broad range of a...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
PENGEMBANGAN MASYARAKAT LINGKAR TAMBANG DALAM PENGUSAHAAN PERTAMBANGAN
PENGEMBANGAN MASYARAKAT LINGKAR TAMBANG DALAM PENGUSAHAAN PERTAMBANGAN
Indonesia is a country rich in mining resources. Mining resources include gold, silver, copper, oil and gas, coal and others. There are a large number of companies operating in the...
Domain Driven Data Mining
Domain Driven Data Mining
Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications. For instance, the usual demonstra...


