Javascript must be enabled to continue!
Sampling algorithms for big graph analytics
View through CrossRef
The analysis of large graphs offers new insights into social and other networks, and thus is of increasing interest to marketeers, sociologists, mathematicians and computer scientists. However, the extremely large size of most graphs of interest renders them difficult to analyze because of at least four challenges: lack of memory, restricted access to the full graph, prohibitive computational cost and real-time changes in the graph. This dissertation presents graph sampling as a powerful and attractive approach to meet the above challenges, whereby properties of the full graph are estimated based on an examination of only a small portion of the graph. In this dissertation, we focus on two graph sampling strategies: edge-based sampling and traversal-based sampling. For edge-based sampling, we propose an edge-based sampling framework for big-graph analytics in dynamic graphs. It enhances the traditional model by enabling the use of additional related information. To demonstrate the advantages of our proposed framework, we present a new sampling algorithm which provides an unbiased estimate of the total number of triangles in a fully dynamic graph where both edge additions and deletions are considered. Our algorithm addresses three of the aforementioned challenges; it has low memory and computational costs, and can be applied to dynamic graphs. In particular, it offers a significantly improved performance in real time estimates compared to current state-of-the-art methods. We also propose several traversal-based graph sampling algorithms for the estimation of a micro-structural property (motif statistics) and the estimation of a macro-structural property (the two largest eigenvalues of the graph). All of these algorithms solve the challenges of prohibitive computational and storage costs, and restricted access. For micro-structural property, we develop a new sampling algorithm which estimates the concentration of mo- tifs of any size via random walk. Unlike previous approaches which enumerate subgraphs around the random walk to find motifs, our algorithm achieves its computational efficiency by using a randomized protocol to sample subgraphs in the neighborhood of the nodes visited by the walk. The experimental results show that our algorithm achieves better accuracy and higher precision than previously known algorithms. For macro-structural property, we propose a series of new sampling algorithms which estimate the top eigenvalues of a graph. Unlike previous methods which try to collect a subgraph with the most influential nodes, our algorithms achieve estimates of the two largest eigenvalues by estimating the number of closed walks of a certain length. The experimental results show that our algorithms are much faster and achieve higher accuracy on most graphs than previously known algorithms seeking to address the same challenges.
Title: Sampling algorithms for big graph analytics
Description:
The analysis of large graphs offers new insights into social and other networks, and thus is of increasing interest to marketeers, sociologists, mathematicians and computer scientists.
However, the extremely large size of most graphs of interest renders them difficult to analyze because of at least four challenges: lack of memory, restricted access to the full graph, prohibitive computational cost and real-time changes in the graph.
This dissertation presents graph sampling as a powerful and attractive approach to meet the above challenges, whereby properties of the full graph are estimated based on an examination of only a small portion of the graph.
In this dissertation, we focus on two graph sampling strategies: edge-based sampling and traversal-based sampling.
For edge-based sampling, we propose an edge-based sampling framework for big-graph analytics in dynamic graphs.
It enhances the traditional model by enabling the use of additional related information.
To demonstrate the advantages of our proposed framework, we present a new sampling algorithm which provides an unbiased estimate of the total number of triangles in a fully dynamic graph where both edge additions and deletions are considered.
Our algorithm addresses three of the aforementioned challenges; it has low memory and computational costs, and can be applied to dynamic graphs.
In particular, it offers a significantly improved performance in real time estimates compared to current state-of-the-art methods.
We also propose several traversal-based graph sampling algorithms for the estimation of a micro-structural property (motif statistics) and the estimation of a macro-structural property (the two largest eigenvalues of the graph).
All of these algorithms solve the challenges of prohibitive computational and storage costs, and restricted access.
For micro-structural property, we develop a new sampling algorithm which estimates the concentration of mo- tifs of any size via random walk.
Unlike previous approaches which enumerate subgraphs around the random walk to find motifs, our algorithm achieves its computational efficiency by using a randomized protocol to sample subgraphs in the neighborhood of the nodes visited by the walk.
The experimental results show that our algorithm achieves better accuracy and higher precision than previously known algorithms.
For macro-structural property, we propose a series of new sampling algorithms which estimate the top eigenvalues of a graph.
Unlike previous methods which try to collect a subgraph with the most influential nodes, our algorithms achieve estimates of the two largest eigenvalues by estimating the number of closed walks of a certain length.
The experimental results show that our algorithms are much faster and achieve higher accuracy on most graphs than previously known algorithms seeking to address the same challenges.
Related Results
ecision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predi
ecision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predictive Analytics in Precision Farming and Predi
The scope of sensor networks and the Internet of Things spanning rapidly to diversified domains but not limited to sports, health, and business trading. In recent past, the sensors...
BIG DATA ANALYTICS: A REVIEW OF ITS TRANSFORMATIVE ROLE IN MODERN BUSINESS INTELLIGENCE
BIG DATA ANALYTICS: A REVIEW OF ITS TRANSFORMATIVE ROLE IN MODERN BUSINESS INTELLIGENCE
In the dynamic landscape of modern business intelligence, Big Data Analytics has emerged as a transformative force, reshaping the way organizations derive insights from vast and di...
Graph convolutional neural networks for 3D data analysis
Graph convolutional neural networks for 3D data analysis
(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...
Impacts of big data on accounting
Impacts of big data on accounting
Big data and data analytics are currently the buzzwords in both academia and industry to become data driven. Big data has been the trending topic in the accounting industry also. B...
A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations
A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations
The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph processing system lies a concurrent gr...
Data Analytics on Graphs Part I: Graphs and Spectra on Graphs
Data Analytics on Graphs Part I: Graphs and Spectra on Graphs
The area of Data Analytics on graphs promises a paradigm shift, as we approach information processing of new classes of data which are typically acquired on irregular but structure...
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Penelitian ini bertujuan menentukan bilangan terhubung titik pelangi (rainbow vertex connection number) pada graf garis dan graf tengah yang diperoleh dari hasil operasi comb antar...
People Analytics
People Analytics
People analytics refers to the systematic and scientific process of applying quantitative or qualitative data analysis methods to derive insights that shape and inform employee-rel...

