Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Order Preserving Data Mining

View through CrossRef
Data mining has emerged over the last decade as probably the most important application in databases. To reproduce one of the most popular but accurate definitions for data mining; “it is the process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints and regularities) from massive databases” (Piatetsky-Shapiro & Frawley 1991). In practice data mining can be thought of as the “crystal ball” of businessmen, scientists, politicians and generally all kinds of people and professions wishing to get more insight on their field of interest and their data. Of course this “crystal ball” is based on a sound and broad scientific basis, using techniques borrowed from fields such as statistics, artificial intelligence, machine learning, mathematics and database research in general among others. Applications of data mining range from analyzing simple point of sales transactions and text documents to astronomical data and homeland security (Data Mining and Homeland Security: An Overview). Usually different applications may require different data mining techniques. The main kinds of techniques that are used in order to discover knowledge from a database are categorized into association rules mining, classification and clustering, with association rules being the most extensively and actively studied area. The problem of finding association rules can be formulated as follows: Given a large data base of item transactions, find all frequent itemsets, where a frequent itemset is one that occurs in at least a userspecified percentage of the data base. In other words find rules of the form X?Y, where X and Y are sets of items. A rule expresses the possibility that whenever we find a transaction that contains all items in X, then this transaction is likely to also contain all items in Y. Consequently X is called the body of the rule and Y the head. The validity and reliability of association rules is expressed usually by means of support and confidence. An example of such a rule is {smoking, no_workout?heart_disease (sup=50%, conf=90%)}, which means that 90% of the people that smoke and do not work out present heart problems, whereas 50% of all our people present all these together. Nevertheless the prominent model for contemplating data in almost all circumstances has been a rather simplistic and crude one, making several concessions. More specifically objects inside the data, like for example items within transactions, have been attributed a Boolean hypostasis (i.e. they appear or not) with their ordering being considered of no interest because they are considered altogether as sets. Of course similar concessions are made in many other fields in order to come to a feasible solution (e.g. in mining data streams). Certainly there is a trade off between the actual depth and precision of knowledge that we wish to uncover from a database and the amount and complexity of data that we are capable of processing to reach that target. In this work we concentrate on the possibility of taking into consideration and utilizing in some way the order of items within data. There are many areas in real world applications and systems that require data with temporal, spatial, spatiotemporal or ordered properties in general where their inherent sequential nature imposes the need for proper storage and processing. Such data include those collected from telecommunication systems, computer networks, wireless sensor networks, retail and logistics. There is a variety of interpretations that can be used to preserve data ordering in a sufficient way according to the intended system functionality.
Title: Order Preserving Data Mining
Description:
Data mining has emerged over the last decade as probably the most important application in databases.
To reproduce one of the most popular but accurate definitions for data mining; “it is the process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints and regularities) from massive databases” (Piatetsky-Shapiro & Frawley 1991).
In practice data mining can be thought of as the “crystal ball” of businessmen, scientists, politicians and generally all kinds of people and professions wishing to get more insight on their field of interest and their data.
Of course this “crystal ball” is based on a sound and broad scientific basis, using techniques borrowed from fields such as statistics, artificial intelligence, machine learning, mathematics and database research in general among others.
Applications of data mining range from analyzing simple point of sales transactions and text documents to astronomical data and homeland security (Data Mining and Homeland Security: An Overview).
Usually different applications may require different data mining techniques.
The main kinds of techniques that are used in order to discover knowledge from a database are categorized into association rules mining, classification and clustering, with association rules being the most extensively and actively studied area.
The problem of finding association rules can be formulated as follows: Given a large data base of item transactions, find all frequent itemsets, where a frequent itemset is one that occurs in at least a userspecified percentage of the data base.
In other words find rules of the form X?Y, where X and Y are sets of items.
A rule expresses the possibility that whenever we find a transaction that contains all items in X, then this transaction is likely to also contain all items in Y.
Consequently X is called the body of the rule and Y the head.
The validity and reliability of association rules is expressed usually by means of support and confidence.
An example of such a rule is {smoking, no_workout?heart_disease (sup=50%, conf=90%)}, which means that 90% of the people that smoke and do not work out present heart problems, whereas 50% of all our people present all these together.
Nevertheless the prominent model for contemplating data in almost all circumstances has been a rather simplistic and crude one, making several concessions.
More specifically objects inside the data, like for example items within transactions, have been attributed a Boolean hypostasis (i.
e.
they appear or not) with their ordering being considered of no interest because they are considered altogether as sets.
Of course similar concessions are made in many other fields in order to come to a feasible solution (e.
g.
in mining data streams).
Certainly there is a trade off between the actual depth and precision of knowledge that we wish to uncover from a database and the amount and complexity of data that we are capable of processing to reach that target.
In this work we concentrate on the possibility of taking into consideration and utilizing in some way the order of items within data.
There are many areas in real world applications and systems that require data with temporal, spatial, spatiotemporal or ordered properties in general where their inherent sequential nature imposes the need for proper storage and processing.
Such data include those collected from telecommunication systems, computer networks, wireless sensor networks, retail and logistics.
There is a variety of interpretations that can be used to preserve data ordering in a sufficient way according to the intended system functionality.

Related Results

Light at the End of the Tunnel: Mining Justice and Health
Light at the End of the Tunnel: Mining Justice and Health
The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...
Impact of Mining on Socioeconomic Status in Puno, Peru
Impact of Mining on Socioeconomic Status in Puno, Peru
This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...
The Significance of Text Mining in Research: A Comprehensive Review
The Significance of Text Mining in Research: A Comprehensive Review
Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
French Technological Development in Nodule Mining
French Technological Development in Nodule Mining
ABSTRACT Since 1971, AFERNOD has studied mining concepts which are adapted to the requirements of commercial exploitation of the nodules deposits together with su...
An Analysis of Text Mining in Big Data
An Analysis of Text Mining in Big Data
The practice of extracting hidden predictive information from a database and structuring it for later use is known as data mining. Web mining, text mining, sequence mining, graph m...
EATURES OF MONITORING OF TECHNOLOGICALLY LOADED AREAS CHANGED BY MILITARY ACTIONS
EATURES OF MONITORING OF TECHNOLOGICALLY LOADED AREAS CHANGED BY MILITARY ACTIONS
Coal mining regions of Ukraine are the most technogenically loaded due to the long period of their development. The negative impact on the environment caused by mining operations h...
Ground Subsidence Evolution from 1000 m Deep Mining: A Case Study in Fengfeng Mining Area
Ground Subsidence Evolution from 1000 m Deep Mining: A Case Study in Fengfeng Mining Area
The mining of coal resources in eastern China has entered the stage of deep mining, and many mines have reached the depth of 1000 meters. Different from shallow and moderate depth ...

Back to Top