Javascript must be enabled to continue!

Order Preserving Data Mining

Data mining has emerged over the last decade as probably the most important application in databases. To reproduce one of the most popular but accurate definitions for data mining; “it is the process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints and regularities) from massive databases” (Piatetsky-Shapiro & Frawley 1991). In practice data mining can be thought of as the “crystal ball” of businessmen, scientists, politicians and generally all kinds of people and professions wishing to get more insight on their field of interest and their data. Of course this “crystal ball” is based on a sound and broad scientific basis, using techniques borrowed from fields such as statistics, artificial intelligence, machine learning, mathematics and database research in general among others. Applications of data mining range from analyzing simple point of sales transactions and text documents to astronomical data and homeland security (Data Mining and Homeland Security: An Overview). Usually different applications may require different data mining techniques. The main kinds of techniques that are used in order to discover knowledge from a database are categorized into association rules mining, classification and clustering, with association rules being the most extensively and actively studied area. The problem of finding association rules can be formulated as follows: Given a large data base of item transactions, find all frequent itemsets, where a frequent itemset is one that occurs in at least a userspecified percentage of the data base. In other words find rules of the form X?Y, where X and Y are sets of items. A rule expresses the possibility that whenever we find a transaction that contains all items in X, then this transaction is likely to also contain all items in Y. Consequently X is called the body of the rule and Y the head. The validity and reliability of association rules is expressed usually by means of support and confidence. An example of such a rule is {smoking, no_workout?heart_disease (sup=50%, conf=90%)}, which means that 90% of the people that smoke and do not work out present heart problems, whereas 50% of all our people present all these together. Nevertheless the prominent model for contemplating data in almost all circumstances has been a rather simplistic and crude one, making several concessions. More specifically objects inside the data, like for example items within transactions, have been attributed a Boolean hypostasis (i.e. they appear or not) with their ordering being considered of no interest because they are considered altogether as sets. Of course similar concessions are made in many other fields in order to come to a feasible solution (e.g. in mining data streams). Certainly there is a trade off between the actual depth and precision of knowledge that we wish to uncover from a database and the amount and complexity of data that we are capable of processing to reach that target. In this work we concentrate on the possibility of taking into consideration and utilizing in some way the order of items within data. There are many areas in real world applications and systems that require data with temporal, spatial, spatiotemporal or ordered properties in general where their inherent sequential nature imposes the need for proper storage and processing. Such data include those collected from telecommunication systems, computer networks, wireless sensor networks, retail and logistics. There is a variety of interpretations that can be used to preserve data ordering in a sufficient way according to the intended system functionality.

IGI Global

Ioannis N. Kouris

Encyclopedia of Data Warehousing and Mining, Second Edition

2011

Title: Order Preserving Data Mining

Description:

Data mining has emerged over the last decade as probably the most important application in databases.

To reproduce one of the most popular but accurate definitions for data mining; “it is the process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints and regularities) from massive databases” (Piatetsky-Shapiro & Frawley 1991).

In practice data mining can be thought of as the “crystal ball” of businessmen, scientists, politicians and generally all kinds of people and professions wishing to get more insight on their field of interest and their data.

Of course this “crystal ball” is based on a sound and broad scientific basis, using techniques borrowed from fields such as statistics, artificial intelligence, machine learning, mathematics and database research in general among others.

Applications of data mining range from analyzing simple point of sales transactions and text documents to astronomical data and homeland security (Data Mining and Homeland Security: An Overview).

Usually different applications may require different data mining techniques.

The main kinds of techniques that are used in order to discover knowledge from a database are categorized into association rules mining, classification and clustering, with association rules being the most extensively and actively studied area.

The problem of finding association rules can be formulated as follows: Given a large data base of item transactions, find all frequent itemsets, where a frequent itemset is one that occurs in at least a userspecified percentage of the data base.

In other words find rules of the form X?Y, where X and Y are sets of items.

A rule expresses the possibility that whenever we find a transaction that contains all items in X, then this transaction is likely to also contain all items in Y.

Consequently X is called the body of the rule and Y the head.

The validity and reliability of association rules is expressed usually by means of support and confidence.

An example of such a rule is {smoking, no_workout?heart_disease (sup=50%, conf=90%)}, which means that 90% of the people that smoke and do not work out present heart problems, whereas 50% of all our people present all these together.

Nevertheless the prominent model for contemplating data in almost all circumstances has been a rather simplistic and crude one, making several concessions.

More specifically objects inside the data, like for example items within transactions, have been attributed a Boolean hypostasis (i.

they appear or not) with their ordering being considered of no interest because they are considered altogether as sets.

Of course similar concessions are made in many other fields in order to come to a feasible solution (e.

in mining data streams).

Certainly there is a trade off between the actual depth and precision of knowledge that we wish to uncover from a database and the amount and complexity of data that we are capable of processing to reach that target.

In this work we concentrate on the possibility of taking into consideration and utilizing in some way the order of items within data.

There are many areas in real world applications and systems that require data with temporal, spatial, spatiotemporal or ordered properties in general where their inherent sequential nature imposes the need for proper storage and processing.

Such data include those collected from telecommunication systems, computer networks, wireless sensor networks, retail and logistics.

There is a variety of interpretations that can be used to preserve data ordering in a sufficient way according to the intended system functionality.

Back

The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...

Impact of Mining on Socioeconomic Status in Puno, Peru

This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...

The Significance of Text Mining in Research: A Comprehensive Review

Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...

Optimisation of potash mining technology for cell and pillar mining method

The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...

Air Pollution in mining Industries has very adverse effects on Human Health, Flora, and Fauna, and proper assessment is needed around the mining areas

The rapid development in India has resulted in rapid growth in the number of Heavy Motor vehicles(HMV) and light motor vehicles(LMV). Automation and Demand in transportation have a...

French Technological Development in Nodule Mining

ABSTRACT Since 1971, AFERNOD has studied mining concepts which are adapted to the requirements of commercial exploitation of the nodules deposits together with su...

An Analysis of Text Mining in Big Data

The practice of extracting hidden predictive information from a database and structuring it for later use is known as data mining. Web mining, text mining, sequence mining, graph m...

EATURES OF MONITORING OF TECHNOLOGICALLY LOADED AREAS CHANGED BY MILITARY ACTIONS

Coal mining regions of Ukraine are the most technogenically loaded due to the long period of their development. The negative impact on the environment caused by mining operations h...

Email:
Password:

Email:

Order Preserving Data Mining

Related Results