Javascript must be enabled to continue!
PIPELINING A SKEW-INSENSITIVE PARALLEL JOIN ALGORITHM
View through CrossRef
Most standard parallel join algorithms try to overcome data skews with a relatively static approach. The way they distribute data (and then computation) over nodes depends on a data re-distribution algorithm (hashing or range partitioning) that is determined before the actual join begins. On the contrary, our approach consists in pre-scanning data in order to choose an efficient join method for each given value of the join attribute. This approach has already proved to be efficient both theoretically and practically in our previous papers. In this paper we introduce a new pipelined version of our frequency adaptive join algorithm. The use of pipelining offers flexible strategies for resource allocation while avoiding unnecessary disk input/output of intermediate join results when computing multi-join queries. We present a detailed version of the algorithm and a cost analysis based on the BSP (Bulk Synchronous Parallel) model, showing that our pipelined algorithm achieves noticeable improvements compared to the sequential parallel version for multi-join queries while guaranteeing perfect balancing properties.
Title: PIPELINING A SKEW-INSENSITIVE PARALLEL JOIN ALGORITHM
Description:
Most standard parallel join algorithms try to overcome data skews with a relatively static approach.
The way they distribute data (and then computation) over nodes depends on a data re-distribution algorithm (hashing or range partitioning) that is determined before the actual join begins.
On the contrary, our approach consists in pre-scanning data in order to choose an efficient join method for each given value of the join attribute.
This approach has already proved to be efficient both theoretically and practically in our previous papers.
In this paper we introduce a new pipelined version of our frequency adaptive join algorithm.
The use of pipelining offers flexible strategies for resource allocation while avoiding unnecessary disk input/output of intermediate join results when computing multi-join queries.
We present a detailed version of the algorithm and a cost analysis based on the BSP (Bulk Synchronous Parallel) model, showing that our pipelined algorithm achieves noticeable improvements compared to the sequential parallel version for multi-join queries while guaranteeing perfect balancing properties.
Related Results
Poster 155: The Prevalence of “Pipelining” at the Top Orthopaedic Sports Medicine Fellowship Programs
Poster 155: The Prevalence of “Pipelining” at the Top Orthopaedic Sports Medicine Fellowship Programs
Objectives:
The term “pipelining” refers to the phenomenon that applicants from certain residency programs frequently match at the same fellowship programs. How...
Skew-braces and ????-braces
Skew-braces and ????-braces
Abstract
Skew-braces are ring-like objects arising in connection with Hopf–Galois theory and set-theoretic solutions ???? to the Yang–Baxter equation.
Interactions b...
Using join.me to help library patrons
Using join.me to help library patrons
PurposeAs the Informatics Librarian at Olivet Nazarene University, my staff and I are often responsible for troubleshooting our patrons' technology issues. My experience with join....
Convolutional Neural Networks using FPGA-based Pipelining
Convolutional Neural Networks using FPGA-based Pipelining
In order to speed up convolutional neural networks (CNNs), this study gives a complete overview of the use of FPGA-based pipelining for hardware acceleration of CNNs. These days, m...
TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System
TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System
<p>Stream join is one of the most fundamental operations in data stream processing applications. Existing distributed stream join systems can support efficient two-way join, ...
Load Sharing in Transversely Post-Tensioned Pre-cast Box Girder Skew Bridges
Load Sharing in Transversely Post-Tensioned Pre-cast Box Girder Skew Bridges
Railway operators often use precast concrete box girders for railway bridges. In many cases, these box girders are placed side by side without a concrete deck on top. Sometimes the...
A Vehicle ID identification Architecture: A Parallel-Joining WSN Algorithm
A Vehicle ID identification Architecture: A Parallel-Joining WSN Algorithm
Several remote sensor network (WSN) tasks require sensor information join. This in-processing Join is configured in parallel sensor hub to save battery power and limit the communic...
Models de distribució sobre el símplex
Models de distribució sobre el símplex
Les dades composicionals són vectors les components dels quals representen proporcions respecte d'un total, i per tant estan sotmesos a la restricció que la suma de les seves compo...

