Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Lifeline-based global load balancing

View through CrossRef
On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS). There are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work. In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution. When work is sparse, this results in performance degradation. In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing. Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts. Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work). This problem is well-studied and requires careful design for good performance. Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection. In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory. First, we introduce lifeline graphs: low-degree, low-diameter, fully connected directed graphs. Such graphs can be constructed from k -dimensional hypercubes. When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph. Quiescent nodes do not disturb other nodes. A quiesced node is reactivated when work arrives from a lifeline and itself shares this work with those of its incoming lifelines that are activated. Termination occurs precisely when computation at all nodes has quiesced. In a language such as X10, such passive distributed termination can be detected automatically using the finish construct -- no application code is necessary. Our design is implemented in a few hundred lines of X10. On the binomial tree described in olivier:08}, the program achieve 87% efficiency on an Infiniband cluster of 1024 Power7 cores, with a peak throughput of 2.37 GNodes/sec. It achieves 87% efficiency on a Blue Gene/P with 2048 processors, and a peak throughput of 0.966 GNodes/s. All numbers are relative to single core sequential performance. This implementation has been refactored into a reusable global load balancing framework. Applications can use this framework to obtain global load balance with minimal code changes. In summary, we claim: (a) the first formulation of UTS that does not involve application level global termination detection, (b) the introduction of lifeline graphs to reduce failed steals (c) the demonstration of simple lifeline graphs based on k-hypercubes, (d) performance with superior efficiency (or the same efficiency but over a wider range) than published results on UTS. In particular, our framework can deliver the same or better performance as an unrestricted random work-stealing implementation, while reducing the number of attempted steals.
Title: Lifeline-based global load balancing
Description:
On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS).
There are two main difficulties in extending this approach to distributed memory.
In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work.
In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution.
When work is sparse, this results in performance degradation.
In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing.
Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts.
Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work).
This problem is well-studied and requires careful design for good performance.
Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection.
In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory.
First, we introduce lifeline graphs: low-degree, low-diameter, fully connected directed graphs.
Such graphs can be constructed from k -dimensional hypercubes.
When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph.
Quiescent nodes do not disturb other nodes.
A quiesced node is reactivated when work arrives from a lifeline and itself shares this work with those of its incoming lifelines that are activated.
Termination occurs precisely when computation at all nodes has quiesced.
In a language such as X10, such passive distributed termination can be detected automatically using the finish construct -- no application code is necessary.
Our design is implemented in a few hundred lines of X10.
On the binomial tree described in olivier:08}, the program achieve 87% efficiency on an Infiniband cluster of 1024 Power7 cores, with a peak throughput of 2.
37 GNodes/sec.
It achieves 87% efficiency on a Blue Gene/P with 2048 processors, and a peak throughput of 0.
966 GNodes/s.
All numbers are relative to single core sequential performance.
This implementation has been refactored into a reusable global load balancing framework.
Applications can use this framework to obtain global load balance with minimal code changes.
In summary, we claim: (a) the first formulation of UTS that does not involve application level global termination detection, (b) the introduction of lifeline graphs to reduce failed steals (c) the demonstration of simple lifeline graphs based on k-hypercubes, (d) performance with superior efficiency (or the same efficiency but over a wider range) than published results on UTS.
In particular, our framework can deliver the same or better performance as an unrestricted random work-stealing implementation, while reducing the number of attempted steals.

Related Results

Zhong-Yong as dynamic balancing between Yin-Yang opposites
Zhong-Yong as dynamic balancing between Yin-Yang opposites
Purpose The purpose of this paper is to comment on Peter Ping Li’s understanding of Zhong-Yong balancing, presented in his article titled “Global implications of the indigenous epi...
Crane Load Moment System For Offshore Crane Operations
Crane Load Moment System For Offshore Crane Operations
Abstract History has shown that dependency upon the crane operator to monitor loads and boom angle or load radius do not allow the margin necessary to perform the...
Implementasi Metode Load Balancing Sebagai Upaya Meningkatkan Kinerja Server
Implementasi Metode Load Balancing Sebagai Upaya Meningkatkan Kinerja Server
The use of the internet turns out to cause problems that must be found a solution. One of them is the service provider's server load which continues to increase. This condition is ...
EVALUASI KINERJA LOAD BALANCING DENGAN ALGORITMA SCHEDULLING NEVER QUEUE
EVALUASI KINERJA LOAD BALANCING DENGAN ALGORITMA SCHEDULLING NEVER QUEUE
Load balancing is used as a technique to handle large loads that cannot be carried out by a single server, so that the server does not experience overload. In handling load sharing...
Generative Adversarial Network for Simulation of Load Balancing Optimization in Mobile Networks
Generative Adversarial Network for Simulation of Load Balancing Optimization in Mobile Networks
<p>The commercial operation of 5G networks is almost ready to be launched, but problems related to wireless environment, load balancing for example, remain. Many load balanci...
Soft Computing Techniques to Analyze the Load Balancing in Cloud Environment
Soft Computing Techniques to Analyze the Load Balancing in Cloud Environment
An emerging method of digital computing known as "cloud computing" has recently gained immense popularity. While there are many benefits to using the cloud over the internet, there...
Balancing Load Outgoing Transformator 2 di Politeknik Negeri Malang
Balancing Load Outgoing Transformator 2 di Politeknik Negeri Malang
In an electric power system, power quality is a major problem. One of the problems is load imbalance. According to IEEE (Institute of electrical and electronic engineer) number 446...
Research on program load spectrum for key components of a metro vehicle body
Research on program load spectrum for key components of a metro vehicle body
This study aims to construct a load spectrum that conforms to the actual working conditions, enabling an accurate fatigue life assessment of key components and enhancing the overal...

Back to Top