Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes

View through CrossRef
This document details a robust, production-ready artificial intelligence inference architecture specifically tailored for healthcare and pharmaceutical applications, leveraging Triton, FastAPI, and Kubernetes for efficient and secure deployment. It outlines the critical components, including a FastAPI Gateway, an optional NLP/CV Preprocessor, and a Triton Inference Server, designed to handle diverse AI models. The architecture also integrates a Model Registry, CI/CD with GitHub Actions, Kubernetes for orchestration, comprehensive Monitoring, and robust Security measures, including optional PHI de-identification. The system supports various use cases within healthcare and pharma inference, ensuring high availability and scalability. The architecture leverages specific ports: Triton uses HTTP 8000, gRPC 8001, and metrics 8002, while the Preprocessor container routes port 8080 to Service 80, facilitating seamless communication within the ecosystem. Key Kubernetes files, such as 'k8s.yaml', 'hpa.yaml', and 'preprocessor.yaml', manage deployment, scaling, and preprocessor configurations, respectively, while security protocols are thoroughly documented in 'SECURITY.md', complemented by a visual representation of the architecture in 'architecture.png'. This comprehensive setup ensures optimized performance and reliability for demanding AI workloads in regulated environments.
Open Engineering Inc
Title: Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes
Description:
This document details a robust, production-ready artificial intelligence inference architecture specifically tailored for healthcare and pharmaceutical applications, leveraging Triton, FastAPI, and Kubernetes for efficient and secure deployment.
It outlines the critical components, including a FastAPI Gateway, an optional NLP/CV Preprocessor, and a Triton Inference Server, designed to handle diverse AI models.
The architecture also integrates a Model Registry, CI/CD with GitHub Actions, Kubernetes for orchestration, comprehensive Monitoring, and robust Security measures, including optional PHI de-identification.
The system supports various use cases within healthcare and pharma inference, ensuring high availability and scalability.
The architecture leverages specific ports: Triton uses HTTP 8000, gRPC 8001, and metrics 8002, while the Preprocessor container routes port 8080 to Service 80, facilitating seamless communication within the ecosystem.
Key Kubernetes files, such as 'k8s.
yaml', 'hpa.
yaml', and 'preprocessor.
yaml', manage deployment, scaling, and preprocessor configurations, respectively, while security protocols are thoroughly documented in 'SECURITY.
md', complemented by a visual representation of the architecture in 'architecture.
png'.
This comprehensive setup ensures optimized performance and reliability for demanding AI workloads in regulated environments.

Related Results

Photochemical modeling of Triton’s atmosphere: methodology and first results
Photochemical modeling of Triton’s atmosphere: methodology and first results
Introduction Triton is the biggest satellite of Neptune. Discovered in 1846 by W. Lassell, it was visited by Voyager 2 in 1989. It was the only spacecraft to study the nep...
A photochemical model of Triton's atmosphere with an uncertainty propagation study
A photochemical model of Triton's atmosphere with an uncertainty propagation study
<p><strong>Introduction</strong></p> <p>Triton is the biggest satellite of Neptune. It was only visited by Voy...
Perceptions of Telemedicine and Rural Healthcare Access in a Developing Country: A Case Study of Bayelsa State, Nigeria
Perceptions of Telemedicine and Rural Healthcare Access in a Developing Country: A Case Study of Bayelsa State, Nigeria
Abstract Introduction Telemedicine is the remote delivery of healthcare services using information and communication technologies and has gained global recognition as a solution to...
DRS: A Deep Reinforcement Learning enhanced Kubernetes Scheduler for Microservice-based System
DRS: A Deep Reinforcement Learning enhanced Kubernetes Scheduler for Microservice-based System
Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud-native distributed applications, as the most famous container orchestration frame...
Cloud Cost Optimization and Sustainability in Kubernetes
Cloud Cost Optimization and Sustainability in Kubernetes
The examination investigates how cloud cost optimization must fit dual requirements of environmental sustainability when applied to Kubernetes-based deployments, as these serve as ...
Evaluation of Triton X-100 Effect on Coagulation Tests
Evaluation of Triton X-100 Effect on Coagulation Tests
Abstract CONTEXT: Triton X-100 is a non-ionic surfactant that has been proposed as a virus inactivator in a laboratory setting for it causes cell lysi...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
A Time Series-Based Approach to Elastic Kubernetes Scaling
A Time Series-Based Approach to Elastic Kubernetes Scaling
With the increasing popularity of cloud-native architectures and containerized applications, Kubernetes has become a critical platform for managing these applications. However, Kub...

Back to Top