Javascript must be enabled to continue!

Lifeline-based global load balancing

On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS). There are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work. In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution. When work is sparse, this results in performance degradation. In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing. Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts. Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work). This problem is well-studied and requires careful design for good performance. Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection. In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory. First, we introduce lifeline graphs: low-degree, low-diameter, fully connected directed graphs. Such graphs can be constructed from k -dimensional hypercubes. When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph. Quiescent nodes do not disturb other nodes. A quiesced node is reactivated when work arrives from a lifeline and itself shares this work with those of its incoming lifelines that are activated. Termination occurs precisely when computation at all nodes has quiesced. In a language such as X10, such passive distributed termination can be detected automatically using the finish construct -- no application code is necessary. Our design is implemented in a few hundred lines of X10. On the binomial tree described in olivier:08}, the program achieve 87% efficiency on an Infiniband cluster of 1024 Power7 cores, with a peak throughput of 2.37 GNodes/sec. It achieves 87% efficiency on a Blue Gene/P with 2048 processors, and a peak throughput of 0.966 GNodes/s. All numbers are relative to single core sequential performance. This implementation has been refactored into a reusable global load balancing framework. Applications can use this framework to obtain global load balance with minimal code changes. In summary, we claim: (a) the first formulation of UTS that does not involve application level global termination detection, (b) the introduction of lifeline graphs to reduce failed steals (c) the demonstration of simple lifeline graphs based on k-hypercubes, (d) performance with superior efficiency (or the same efficiency but over a wider range) than published results on UTS. In particular, our framework can deliver the same or better performance as an unrestricted random work-stealing implementation, while reducing the number of attempted steals.

Association for Computing Machinery (ACM)

Vijay A. Saraswat Prabhanjan Kambadur Sreedhar Kodali David Grove Sriram Krishnamoorthy

ACM SIGPLAN Notices

2011

Title: Lifeline-based global load balancing

Description:

On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS).

There are two main difficulties in extending this approach to distributed memory.

In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work.

In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution.

When work is sparse, this results in performance degradation.

In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing.

Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts.

Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work).

This problem is well-studied and requires careful design for good performance.

Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection.

In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory.

First, we introduce lifeline graphs: low-degree, low-diameter, fully connected directed graphs.

Such graphs can be constructed from k -dimensional hypercubes.

When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph.

Quiescent nodes do not disturb other nodes.

A quiesced node is reactivated when work arrives from a lifeline and itself shares this work with those of its incoming lifelines that are activated.

Termination occurs precisely when computation at all nodes has quiesced.

In a language such as X10, such passive distributed termination can be detected automatically using the finish construct -- no application code is necessary.

Our design is implemented in a few hundred lines of X10.

On the binomial tree described in olivier:08}, the program achieve 87% efficiency on an Infiniband cluster of 1024 Power7 cores, with a peak throughput of 2.

37 GNodes/sec.

It achieves 87% efficiency on a Blue Gene/P with 2048 processors, and a peak throughput of 0.

966 GNodes/s.

All numbers are relative to single core sequential performance.

This implementation has been refactored into a reusable global load balancing framework.

Applications can use this framework to obtain global load balance with minimal code changes.

In summary, we claim: (a) the first formulation of UTS that does not involve application level global termination detection, (b) the introduction of lifeline graphs to reduce failed steals (c) the demonstration of simple lifeline graphs based on k-hypercubes, (d) performance with superior efficiency (or the same efficiency but over a wider range) than published results on UTS.

In particular, our framework can deliver the same or better performance as an unrestricted random work-stealing implementation, while reducing the number of attempted steals.

Back

Abstract History has shown that dependency upon the crane operator to monitor loads and boom angle or load radius do not allow the margin necessary to perform the...

Implementasi Metode Load Balancing Sebagai Upaya Meningkatkan Kinerja Server

The use of the internet turns out to cause problems that must be found a solution. One of them is the service provider's server load which continues to increase. This condition is ...

EVALUASI KINERJA LOAD BALANCING DENGAN ALGORITMA SCHEDULLING NEVER QUEUE

Load balancing is used as a technique to handle large loads that cannot be carried out by a single server, so that the server does not experience overload. In handling load sharing...

Generative Adversarial Network for Simulation of Load Balancing Optimization in Mobile Networks

<p>The commercial operation of 5G networks is almost ready to be launched, but problems related to wireless environment, load balancing for example, remain. Many load balanci...

Load Balancing at Fog Nodes using Scheduling Algorithms

Cloud Computing proves to be most predominant innovative field in the area of Information technology. Cloud is best suited for small scale to large scale businesses and personal pu...

Intelligent Load Balancing Techniques in Software Defined Networks: A Survey

In the current technology driven era, the use of devices that connect to the internet has increased significantly. Consequently, there has been a significant increase in internet t...

Clustering based EO with MRF technique for effective load balancing in cloud computing

Purpose Cloud computing (CC) refers to the usage of virtualization technology to share computing resources through the internet. Task scheduling (TS) is used to assign computationa...

Server and network load balancing

Software Defined Networking (SDN), is an emerging networking technology. This thesis aims to develop a new Server and Network Load balancing scheme in content delivery datacenters ...

Email:
Password:

Email:

Lifeline-based global load balancing

Related Results