01384 GitHub stars
02Multi-modal support for Vision Language Models (VLMs)
03Fast structured generation with JSON schema and regex constraints
04RadixAttention for automatic prefix caching and KV cache reuse
05High-performance serving with tensor parallelism and continuous batching
06OpenAI-compatible API for seamless integration with existing SDKs