Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

N Optimizing Multi-Tenant DAG Execution Systems for High-Throughput Inference

View through CrossRef
In large-scale data processing and machine learning systems, Directed Acyclic Graphs (DAGs) serve as the backbone for orchestrating complex workflows that involve multiple dependent stages. Multi-tenant DAG execution systems are increasingly being used to handle concurrent workloads from multiple users and applications. However, these systems face significant challenges when it comes to achieving high-throughput inference, particularly in shared environments where resource contention, scheduling efficiency, and tenant isolation become critical concerns. High-throughput inference is a necessity in use cases such as real-time recommendation engines, large-scale data processing pipelines, and cloud-based AI services, where latency and throughput are vital to maintaining system performance. This research paper aims to address the primary challenges associated with optimizing multi-tenant DAG execution systems for high-throughput inference. We begin by analyzing the limitations of existing frameworks such as Apache Airflow, Luigi, and Prefect in multi-tenant environments, focusing on issues like resource contention, inefficient scheduling, and lack of dynamic scalability. To tackle these issues, we propose a set of optimization strategies that include adaptive resource allocation, tenant-aware scheduling, and hybrid execution models that balance between real-time and batch inference. Our first strategy involves dynamic partitioning of resources to prevent contention and ensure fair allocation among tenants based on workload priority and expected resource utilization. This approach is supplemented by intelligent scheduling techniques that leverage cost-based heuristics and priority queues, reducing overall latency and improving system throughput. Additionally, we introduce a hybrid execution model that supports both real-time and batch processing pipelines, enabling flexible execution of diverse workload types in the same shared environment. This allows the system to dynamically switch between real-time and batch modes based on workload characteristics, thereby optimizing resource utilization. To further enhance performance, we propose incorporating memory-aware caching mechanisms that prioritize data locality and reduce redundant data movements between nodes in the DAG. This not only decreases execution time for individual DAG stages but also minimizes I/O overhead, a critical factor in high-throughput systems. These strategies are integrated into a multi-tenant DAG execution framework designed to support various machine learning and data analytics workloads in a cloud-native environment. The effectiveness of our optimizations is evaluated through comprehensive experiments using real-world datasets and synthetic benchmarks, comparing our approach against baseline systems. Our results demonstrate significant improvements in throughput, latency, and scalability, validating the proposed techniques for real-world adoption in multi-tenant DAG execution systems. We also present a case study of applying these optimizations to a large-scale AI inference platform, highlighting the practical benefits and potential challenges of deploying such systems in a production environment. Ultimately, this research provides valuable insights into optimizing DAG execution for high-throughput inference, offering a blueprint for building scalable, efficient, and tenant-aware DAG systems capable of handling diverse and dynamic workloads.
Title: N Optimizing Multi-Tenant DAG Execution Systems for High-Throughput Inference
Description:
In large-scale data processing and machine learning systems, Directed Acyclic Graphs (DAGs) serve as the backbone for orchestrating complex workflows that involve multiple dependent stages.
Multi-tenant DAG execution systems are increasingly being used to handle concurrent workloads from multiple users and applications.
However, these systems face significant challenges when it comes to achieving high-throughput inference, particularly in shared environments where resource contention, scheduling efficiency, and tenant isolation become critical concerns.
High-throughput inference is a necessity in use cases such as real-time recommendation engines, large-scale data processing pipelines, and cloud-based AI services, where latency and throughput are vital to maintaining system performance.
This research paper aims to address the primary challenges associated with optimizing multi-tenant DAG execution systems for high-throughput inference.
We begin by analyzing the limitations of existing frameworks such as Apache Airflow, Luigi, and Prefect in multi-tenant environments, focusing on issues like resource contention, inefficient scheduling, and lack of dynamic scalability.
To tackle these issues, we propose a set of optimization strategies that include adaptive resource allocation, tenant-aware scheduling, and hybrid execution models that balance between real-time and batch inference.
Our first strategy involves dynamic partitioning of resources to prevent contention and ensure fair allocation among tenants based on workload priority and expected resource utilization.
This approach is supplemented by intelligent scheduling techniques that leverage cost-based heuristics and priority queues, reducing overall latency and improving system throughput.
Additionally, we introduce a hybrid execution model that supports both real-time and batch processing pipelines, enabling flexible execution of diverse workload types in the same shared environment.
This allows the system to dynamically switch between real-time and batch modes based on workload characteristics, thereby optimizing resource utilization.
To further enhance performance, we propose incorporating memory-aware caching mechanisms that prioritize data locality and reduce redundant data movements between nodes in the DAG.
This not only decreases execution time for individual DAG stages but also minimizes I/O overhead, a critical factor in high-throughput systems.
These strategies are integrated into a multi-tenant DAG execution framework designed to support various machine learning and data analytics workloads in a cloud-native environment.
The effectiveness of our optimizations is evaluated through comprehensive experiments using real-world datasets and synthetic benchmarks, comparing our approach against baseline systems.
Our results demonstrate significant improvements in throughput, latency, and scalability, validating the proposed techniques for real-world adoption in multi-tenant DAG execution systems.
We also present a case study of applying these optimizations to a large-scale AI inference platform, highlighting the practical benefits and potential challenges of deploying such systems in a production environment.
Ultimately, this research provides valuable insights into optimizing DAG execution for high-throughput inference, offering a blueprint for building scalable, efficient, and tenant-aware DAG systems capable of handling diverse and dynamic workloads.

Related Results

Two essays in real estate
Two essays in real estate
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] In this dissertation, I investigate two segments of the real estate literature: real estate portfolio manage...
TENANT COMPLAINT HANDLING COMMUNICATION MODEL IN THE PARAMA APARTMENT
TENANT COMPLAINT HANDLING COMMUNICATION MODEL IN THE PARAMA APARTMENT
This study aims to determine the implementation of tenant complaint handling at Parama Apartments. The procedures for handling tenant complaints at Parama Apartments are as follows...
Converged RAN/MEC slicing in beyond 5G (B5G) networks
Converged RAN/MEC slicing in beyond 5G (B5G) networks
(English) The main objective of this thesis is to propose solutions for implementing dynamic RAN slicing and Functional Split (FS) along with MEC placements in 5G/B5G. In particula...
3D-DAG: Scaling Blockchain via DAG
3D-DAG: Scaling Blockchain via DAG
Blockchain has been widely recognized as a trusted computing paradigm underpinning decentralized applications (DApps). However, low performance and poor scalability of the existing...
A causal inference method for athletic injuries based on quantile threshold functions and latent Gaussian DAG models
A causal inference method for athletic injuries based on quantile threshold functions and latent Gaussian DAG models
IntroductionCausal inference of athletic injuries provides the critical foundations for the development of effective prevention strategies. In recent years, the directed acyclic gr...
Misleading arrows? Fitness landscapes and cancer progression modelsĀ 
Misleading arrows? Fitness landscapes and cancer progression modelsĀ 
Abstract Cancer progression models such as Oncogenetic Tress [1], Conjunctive Bayesian Networks [2], or CAPRI [3], try to infer possible restrictio...
Contract Farming & Cost of Funding: Evidence from Madurase Solar Salt Small Businesses
Contract Farming & Cost of Funding: Evidence from Madurase Solar Salt Small Businesses
This study aimed to estimate the indirect expenses borne by tenant farmers due to credits they receive in community salt farming operations managed through contract farming systems...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...

Back to Top