关于
The unsloth-inference skill provides specialized guidance for moving fine-tuned models from training into production environments. It automates the selection of optimized inference paths, including 2x faster native execution via specialized Triton kernels and production-grade serving through engines like vLLM and SGLang. By guiding users through weight merging strategies (16-bit or 4-bit) and OpenAI-compatible API setup, this skill ensures that Claude can efficiently help developers deploy high-throughput, low-latency AI endpoints with minimal manual configuration.