Javascript must be enabled to continue!

Subgraph Mining

The amount of available data is increasing very fast. With this data, the desire for data mining is also growing. More and larger databases have to be searched to find interesting (and frequent) elements and connections between them. Most often the data of interest is very complex. It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information. Having a graph database, the main goal is to find connections and similarities between its graphs. Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area. Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases. The actual occurrence of a fragment in a database graph is called embedding. Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter. The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways. (Cook & Holder, 2000). Subdue is based on a heuristic search and does not find all possible fragments and embeddings. It took a few more years before more and faster approaches appeared. In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004). Also Inductive Logic Programming (Finn et al., 1998) was applied in this area. At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments. (Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004). Several different application areas for graph mining are researched. The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure. When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002). Similar problems occur for protein databases. Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000). Another application area are web searches (Cook, Manocha, & Holder, 2003). Existing search engines use linear feature matches. Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them. Posing a query as a graph means a smaller graph has to be embedded in the larger one. The graph modeling the data structure can be mined to find similar clusters. Quite new is the application of subgraph mining in optimizing code for embedded devices. With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems. There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.al., 2007).

IGI Global

Ingrid Fischer

Encyclopedia of Data Warehousing and Mining, Second Edition

2011

Title: Subgraph Mining

Description:

The amount of available data is increasing very fast.

With this data, the desire for data mining is also growing.

More and larger databases have to be searched to find interesting (and frequent) elements and connections between them.

Most often the data of interest is very complex.

It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information.

Having a graph database, the main goal is to find connections and similarities between its graphs.

Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area.

Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases.

The actual occurrence of a fragment in a database graph is called embedding.

Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter.

The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways.

(Cook & Holder, 2000).

Subdue is based on a heuristic search and does not find all possible fragments and embeddings.

It took a few more years before more and faster approaches appeared.

In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004).

Also Inductive Logic Programming (Finn et al.

, 1998) was applied in this area.

At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments.

(Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004).

Several different application areas for graph mining are researched.

The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure.

When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002).

Similar problems occur for protein databases.

Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000).

Another application area are web searches (Cook, Manocha, & Holder, 2003).

Existing search engines use linear feature matches.

Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them.

Posing a query as a graph means a smaller graph has to be embedded in the larger one.

The graph modeling the data structure can be mined to find similar clusters.

Quite new is the application of subgraph mining in optimizing code for embedded devices.

With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems.

There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.

al.

, 2007).

Back

The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...

French Technological Development in Nodule Mining

ABSTRACT Since 1971, AFERNOD has studied mining concepts which are adapted to the requirements of commercial exploitation of the nodules deposits together with su...

Potential for increasing the efficiency of design processes for mining the solid mineral deposits based on digitalization and advanced analytics

Purpose. The research purpose is to develop and adapt the existing scientific-methodological, as well as software and information base for managing the geotechnological complexes t...

PENGEMBANGAN MASYARAKAT LINGKAR TAMBANG DALAM PENGUSAHAAN PERTAMBANGAN

Indonesia is a country rich in mining resources. Mining resources include gold, silver, copper, oil and gas, coal and others. There are a large number of companies operating in the...

Efficient Frequent Subgraph Mining for Dynamic Network Graphs Using Golden Dung Graph Hybridization

ABSTRACTFrequent subgraph mining (FSM) is one of the most critical procedures for mining meaningful patterns in large and dynamic graph datasets, common in several applications, su...

The use or abuse of thematic mining information maps

Abstract Thematic and environmental geology mapping has been applied in recent years by the British Geological Survey (BGS) in cities, towns, urban fringes, rural areas a...

Substantiation of resource-saving technology when mining the deposits for the production of crushed-stone products

Purpose. Scientific substantiation of the expedient depth of mining the non-metallic deposits of rocky minerals on the basis of mathematical and statistical methods, which will ens...

ON THE DEVELOPMENT OF A GENERAL METHOD FOR FORECASTING THE DANGEROUS PROPERTIES OF COAL SEAMS

Purpose: to establish a quantitative effect on the dust-generating ability of mine layers of the degree of metamorphic transformations of fossil coals, mining-geological and mining...

Email:
Password:

Email:

Subgraph Mining

Related Results