Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System

View through CrossRef
<p>Stream join is one of the most fundamental operations in data stream processing applications. Existing distributed stream join systems can support efficient two-way join, which is a join operation between two streams. Based the two-way join, implementing a three-way join require to be split into double two-way joins, where the second two-way join needs to wait for the join result transmitted from the first two-way join. We show through experiments that such a design raises prohibitively high processing latency. To solve this problem, we propose TriJoin, a time-efficient three-way distributed stream join system. We design a symmetric wait-free structure by symmetrically partitioning tuples and reused join. TriJoin utilizes reused join to join each new tuple with the intermediate result of the other two streams and stored tuples locally. For a new tuple, TriJoin only joins it with the intermediate result to generate the final result without waiting, greatly reducing the processing latency. In TriJoin, we design two partitioning and storage schemes according to two different forms of three-way stream join. We implement TriJoin and conduct comprehensive experiments to evaluate the performance using real-world traces. Results show that TriJoin significantly reduces the processing latency by up to 68%, compared to existing designs.</p> <p>&nbsp;</p>
Title: TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System
Description:
<p>Stream join is one of the most fundamental operations in data stream processing applications.
Existing distributed stream join systems can support efficient two-way join, which is a join operation between two streams.
Based the two-way join, implementing a three-way join require to be split into double two-way joins, where the second two-way join needs to wait for the join result transmitted from the first two-way join.
We show through experiments that such a design raises prohibitively high processing latency.
To solve this problem, we propose TriJoin, a time-efficient three-way distributed stream join system.
We design a symmetric wait-free structure by symmetrically partitioning tuples and reused join.
TriJoin utilizes reused join to join each new tuple with the intermediate result of the other two streams and stored tuples locally.
For a new tuple, TriJoin only joins it with the intermediate result to generate the final result without waiting, greatly reducing the processing latency.
In TriJoin, we design two partitioning and storage schemes according to two different forms of three-way stream join.
We implement TriJoin and conduct comprehensive experiments to evaluate the performance using real-world traces.
Results show that TriJoin significantly reduces the processing latency by up to 68%, compared to existing designs.
</p> <p>&nbsp;</p>.

Related Results

Using join.me to help library patrons
Using join.me to help library patrons
PurposeAs the Informatics Librarian at Olivet Nazarene University, my staff and I are often responsible for troubleshooting our patrons' technology issues. My experience with join....
Influence of diurnal variations in stream temperature on streamflow loss and groundwater recharge
Influence of diurnal variations in stream temperature on streamflow loss and groundwater recharge
We demonstrate that for losing reaches with significant diurnal variations in stream temperature, the effect of stream temperature on streambed seepage is a major factor contributi...
Hydrogeological control of the thermal regime of a sub-alpine headwater stream
Hydrogeological control of the thermal regime of a sub-alpine headwater stream
Stream thermal regimes are critical to the stability of freshwater habitats. There is growing concern that climate change will result in stream warming due to rising air temperatur...
TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing
TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing
Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S)...
Wadeable stream habitat monitoring at Chattahoochee River National Recreation Area: 2021 change report
Wadeable stream habitat monitoring at Chattahoochee River National Recreation Area: 2021 change report
The Southeast Coast Network (SECN) stream habitat monitoring protocol collects data to give park resource managers insight into the status of and trends in stream and near-channel ...
Continental hydrosystem modelling: the concept of nested stream–aquifer interfaces
Continental hydrosystem modelling: the concept of nested stream–aquifer interfaces
Abstract. Recent developments in hydrological modelling are based on a view of the interface being a single continuum through which water flows. These coupled hydrological-hydrogeo...
EPD Electronic Pathogen Detection v1
EPD Electronic Pathogen Detection v1
Electronic pathogen detection (EPD) is a non - invasive, rapid, affordable, point- of- care test, for Covid 19 resulting from infection with SARS-CoV-2 virus. EPD scanning techno...

Back to Top