登録する Power Your Agents接続

ニュースに戻る

llm-d: Kubernetes-native distributed inferencing

出典:Redhat.com

May 20, 2025 (3 months ago)

記事全文を読む

記事の概要

LLM-D is introduced as a new open-source framework designed for Kubernetes-native distributed inference of large language models.

It aims to simplify the deployment and scaling of LLMs, managing resources like GPUs and CPU-only nodes efficiently.

LLM-D allows for flexible model slicing and pipelining across multiple nodes and GPUs, optimizing resource utilization.

The framework supports various inference runtimes, including vLLM and TensorRT-LLM, ensuring compatibility with different model architectures.

It features a Kubernetes Operator for streamlined lifecycle management of LLM inference services within a Kubernetes cluster.