Javascript must be enabled to continue!

Subgraph Mining

The amount of available data is increasing very fast. With this data, the desire for data mining is also growing. More and larger databases have to be searched to find interesting (and frequent) elements and connections between them. Most often the data of interest is very complex. It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information. Having a graph database, the main goal is to find connections and similarities between its graphs. Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area. Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases. The actual occurrence of a fragment in a database graph is called embedding. Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter. The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways. (Cook & Holder, 2000). Subdue is based on a heuristic search and does not find all possible fragments and embeddings. It took a few more years before more and faster approaches appeared. In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004). Also Inductive Logic Programming (Finn et al., 1998) was applied in this area. At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments. (Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004). Several different application areas for graph mining are researched. The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure. When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002). Similar problems occur for protein databases. Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000). Another application area are web searches (Cook, Manocha, & Holder, 2003). Existing search engines use linear feature matches. Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them. Posing a query as a graph means a smaller graph has to be embedded in the larger one. The graph modeling the data structure can be mined to find similar clusters. Quite new is the application of subgraph mining in optimizing code for embedded devices. With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems. There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.al., 2007).

IGI Global

Ingrid Fischer

Encyclopedia of Data Warehousing and Mining, Second Edition

2011

Title: Subgraph Mining

Description:

The amount of available data is increasing very fast.

With this data, the desire for data mining is also growing.

More and larger databases have to be searched to find interesting (and frequent) elements and connections between them.

Most often the data of interest is very complex.

It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information.

Having a graph database, the main goal is to find connections and similarities between its graphs.

Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area.

Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases.

The actual occurrence of a fragment in a database graph is called embedding.

Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter.

The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways.

(Cook & Holder, 2000).

Subdue is based on a heuristic search and does not find all possible fragments and embeddings.

It took a few more years before more and faster approaches appeared.

In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004).

Also Inductive Logic Programming (Finn et al.

, 1998) was applied in this area.

At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments.

(Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004).

Several different application areas for graph mining are researched.

The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure.

When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002).

Similar problems occur for protein databases.

Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000).

Another application area are web searches (Cook, Manocha, & Holder, 2003).

Existing search engines use linear feature matches.

Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them.

Posing a query as a graph means a smaller graph has to be embedded in the larger one.

The graph modeling the data structure can be mined to find similar clusters.

Quite new is the application of subgraph mining in optimizing code for embedded devices.

With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems.

There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.

al.

, 2007).

Back

Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications of the modern data science. Some of the FSM algorithms have the object...

A truss‐based approach for densest homogeneous subgraph mining in node‐attributed graphs

AbstractIn a wide range of graph analysis tasks such as community detection and event detection, densest subgraph mining is important and primitive. With the development of social ...

Light at the End of the Tunnel: Mining Justice and Health

The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...

CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference

Abstract Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is als...

Discriminative Subgraph Mining for Protein Classification

Protein classification can be performed by representing 3-D protein structures by graphs and then classifying the corresponding graphs. One effective way to classify such graphs is...

Impact of Mining on Socioeconomic Status in Puno, Peru

This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...

Optimisation of potash mining technology for cell and pillar mining method

The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...

The Significance of Text Mining in Research: A Comprehensive Review

Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...

Email:
Password:

Email:

Subgraph Mining

Related Results