01High-throughput inference with in-flight batching and Paged KV cache
02Advanced quantization support including FP8, INT4, and FP4 formats
03Performance benchmarking for Llama 3, DeepSeek, and Mixtral models
04Ready-to-use patterns for Triton Inference Server and trtllm-serve
05384 GitHub stars
06Multi-GPU scaling via Tensor, Pipeline, and Expert parallelism