About
This skill provides a comprehensive framework for managing limited GPU VRAM across multiple concurrent services like Ollama, Whisper, and ComfyUI. It implements robust patterns for catching 'Out of Memory' (OOM) errors, executing timed retries, and configuring aggressive auto-unload behaviors for idle models. By establishing a service signaling protocol, it allows disparate AI applications to politely request memory from one another, ensuring smooth hardware sharing in complex local or server-side AI workflows without the need for a centralized orchestrator.