About
This skill provides robust patterns for orchestrating GPU-intensive services like Ollama, Whisper, and ComfyUI on shared hardware. It enables automated Out-of-Memory (OOM) recovery through intelligent retry loops, implements idle-based auto-unloading, and offers a lightweight signaling protocol for services to politely request memory from one another. This is particularly useful for developers and AI researchers running complex local environments where multiple models must compete for limited VRAM without manual intervention.