Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Subgraph Mining

View through CrossRef
The amount of available data is increasing very fast. With this data, the desire for data mining is also growing. More and larger databases have to be searched to find interesting (and frequent) elements and connections between them. Most often the data of interest is very complex. It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information. Having a graph database, the main goal is to find connections and similarities between its graphs. Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area. Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases. The actual occurrence of a fragment in a database graph is called embedding. Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter. The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways. (Cook & Holder, 2000). Subdue is based on a heuristic search and does not find all possible fragments and embeddings. It took a few more years before more and faster approaches appeared. In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004). Also Inductive Logic Programming (Finn et al., 1998) was applied in this area. At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments. (Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004). Several different application areas for graph mining are researched. The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure. When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002). Similar problems occur for protein databases. Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000). Another application area are web searches (Cook, Manocha, & Holder, 2003). Existing search engines use linear feature matches. Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them. Posing a query as a graph means a smaller graph has to be embedded in the larger one. The graph modeling the data structure can be mined to find similar clusters. Quite new is the application of subgraph mining in optimizing code for embedded devices. With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems. There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.al., 2007).
Title: Subgraph Mining
Description:
The amount of available data is increasing very fast.
With this data, the desire for data mining is also growing.
More and larger databases have to be searched to find interesting (and frequent) elements and connections between them.
Most often the data of interest is very complex.
It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information.
Having a graph database, the main goal is to find connections and similarities between its graphs.
Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area.
Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases.
The actual occurrence of a fragment in a database graph is called embedding.
Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter.
The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways.
(Cook & Holder, 2000).
Subdue is based on a heuristic search and does not find all possible fragments and embeddings.
It took a few more years before more and faster approaches appeared.
In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004).
Also Inductive Logic Programming (Finn et al.
, 1998) was applied in this area.
At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments.
(Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004).
Several different application areas for graph mining are researched.
The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure.
When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002).
Similar problems occur for protein databases.
Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000).
Another application area are web searches (Cook, Manocha, & Holder, 2003).
Existing search engines use linear feature matches.
Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them.
Posing a query as a graph means a smaller graph has to be embedded in the larger one.
The graph modeling the data structure can be mined to find similar clusters.
Quite new is the application of subgraph mining in optimizing code for embedded devices.
With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems.
There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.
al.
, 2007).

Related Results

A truss‐based approach for densest homogeneous subgraph mining in node‐attributed graphs
A truss‐based approach for densest homogeneous subgraph mining in node‐attributed graphs
AbstractIn a wide range of graph analysis tasks such as community detection and event detection, densest subgraph mining is important and primitive. With the development of social ...
Light at the End of the Tunnel: Mining Justice and Health
Light at the End of the Tunnel: Mining Justice and Health
The mining industry provides valuable mined commodities and financial support for communities worldwide. Mining has become safer for workers. Significant injustices, however, are c...
CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference
CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference
Abstract Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is als...
Discriminative Subgraph Mining for Protein Classification
Discriminative Subgraph Mining for Protein Classification
Protein classification can be performed by representing 3-D protein structures by graphs and then classifying the corresponding graphs. One effective way to classify such graphs is...
Impact of Mining on Socioeconomic Status in Puno, Peru
Impact of Mining on Socioeconomic Status in Puno, Peru
This study examines the direct and indirect effects of mining activities on key socioeconomic indicators such as per capita income, the Human Development Index (HDI), and education...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
The Significance of Text Mining in Research: A Comprehensive Review
The Significance of Text Mining in Research: A Comprehensive Review
Text mining has emerged as a pivotal tool in various domains of research, revolutionizing the way scholars and scientists extract valuable insights from vast volumes of textual dat...
Dynamic Top-K Interesting Subgraph Query on Large-Scale Labeled Graphs
Dynamic Top-K Interesting Subgraph Query on Large-Scale Labeled Graphs
A labeled graph is a special structure with node identification capability, which is often used in information networks, biological networks, and other fields. The subgraph query i...

Back to Top