Javascript must be enabled to continue!
Adaptive Learning and Mining for Data Streams and Frequent Patterns
View through CrossRef
Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres.
En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals.
Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per
fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria.
I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per
fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees.
In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks.
Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures.
And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.
Title: Adaptive Learning and Mining for Data Streams and Frequent Patterns
Description:
Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats.
Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres.
En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai.
En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps.
Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes.
Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps.
Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin.
Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors.
Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals.
Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos.
En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats.
A més, presentem algorismes eficients per fer tests de subarbres i per
fer mineria d'arbres freqüents tancats ordenats i no ordenats.
S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria.
I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per
fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps.
Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois.
Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades.
Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees.
First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees.
In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time.
In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time.
Our methods are based on using change detectors and estimator modules at the right places.
We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data.
Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms.
We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods.
We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks.
Trees are connected acyclic graphs and they are studied as link-based structures in many cases.
In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining.
Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees.
We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures.
And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time.
We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory.
Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams.
We use these methods to develop classification methods for tree data streams.
Related Results
Light at the End of the Tunnel: Mining Justice and Health
Light at the End of the Tunnel: Mining Justice and Health
The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Physicochemical and rheological properties of Australian and Russian wheat flour mill streams
Physicochemical and rheological properties of Australian and Russian wheat flour mill streams
AbstractBackground and objectivesThe physicochemical and rheological properties of flour mill streams obtained from the normal milling process of two wheat types were studied. Flou...
Quantitative Geomorphological Parameters Analysis for the Aynalem- Illala Streams, Tigray, Northern Ethiopia
Quantitative Geomorphological Parameters Analysis for the Aynalem- Illala Streams, Tigray, Northern Ethiopia
Morphometric analysis is the measurement and mathematical analysis of the configuration of the surface, shape, and dimension of landforms. The objective of this study is to charact...
Impact of Mining on Socioeconomic Status in Puno, Peru
Impact of Mining on Socioeconomic Status in Puno, Peru
This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...
The Significance of Text Mining in Research: A Comprehensive Review
The Significance of Text Mining in Research: A Comprehensive Review
Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...
Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge ar...
Regulation of CO
2
emissions from temperate streams and reservoirs
Regulation of CO
2
emissions from temperate streams and reservoirs
Abstract. It has become more and more evident that CO2 emission (FCO2) from freshwater systems is an important part of the global carbon cycle. To date, only a few studies have add...

