Does vllm-mlx support vision and audio models?

Yes, it supports Vision-Language Models (VLMs) and audio processing (STT/TTS) using dedicated MLX sub-libraries which this skill can help you configure.

Can I use this skill to connect Claude Code to a local model?

Yes, this skill provides the specific environment variables and configurations needed to route Claude Code to your local vllm-mlx server.

vllm-mlx is an Apple Silicon native inference server that uses the MLX framework to provide GPU-accelerated LLM services with OpenAI and Anthropic compatible APIs.

How do I fix Out of Memory (OOM) errors on my Mac?

The skill suggests switching to higher quantization levels (like 4-bit or 3-bit models) and provides guidance on adjusting KV cache parameters to fit your available Unified Memory.

vLLM-MLX Expert

Name: vLLM-MLX Expert
Author: zeero

byzeero

•

Ciencia de Datos y ML

Optimizes and manages Apple Silicon native LLM inference servers using the MLX backend with OpenAI and Anthropic API compatibility.

This skill transforms Claude into a specialized consultant for vllm-mlx, a high-performance inference engine tailored specifically for Apple M-series hardware. It provides expert guidance on server configuration, model quantization (4-bit/8-bit), multi-modal support (VLM/Audio), and advanced features like continuous batching for high-throughput requests. Whether you are deploying local reasoning models like DeepSeek-R1 or routing Claude Code to a local inference backend, this skill offers technical implementation patterns, performance tuning tips, and debugging support for the MLX ecosystem.

Características Principales

01Support for LLM, VLM, Audio, and Embedding model architectures

02Apple Silicon optimized server deployment and configuration

03Configuration guides for Reasoning models like Qwen and DeepSeek

04OpenAI and Anthropic SDK integration patterns

05Performance tuning for Continuous Batching and KV Cache

063 GitHub stars

Casos de Uso

01Debugging and developing custom model handlers within the vllm-mlx source code

02Optimizing inference throughput for local AI agents and Claude Code

03Deploying a local, private LLM API on Mac hardware

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add zeero/dotfiles vllm-mlx-expert

For use in Claude.ai and ChatGPT