What services does this VRAM manager support?

It includes implementation patterns for PyTorch, Transformers, ComfyUI, Flux, Ollama, and general shell-based GPU commands.

Can it proactively free up memory?

Yes, it includes a signaling protocol where one service can send a POST request to another service's endpoint to request it unloads its model if it is currently idle.

How do I configure Ollama to work with this pattern?

The skill provides specific systemd override configurations to set the OLLAMA_KEEP_ALIVE variable, ensuring Ollama releases VRAM quickly after a request.

How does the OOM retry logic work?

It catches CUDA Out-of-Memory errors, clears the CUDA cache, waits for a specified delay to allow other services to idle out, and retries the model load up to three times.

VRAM & GPU Memory Manager

Name: VRAM & GPU Memory Manager
Author: lawless-m

bylawless-m

•

데이터 과학 및 ML

Manages GPU VRAM allocation through OOM retry logic, idle auto-unloading, and cross-service signaling protocols.

This skill provides standardized implementation patterns for managing shared GPU resources across multiple AI services such as Ollama, Whisper, and ComfyUI. It addresses the common Out-of-Memory (OOM) bottleneck by implementing sophisticated retry loops, configurable idle timeouts for model unloading, and a signaling protocol that allows services to request VRAM clearance from one another. It is particularly useful for developers running multiple local AI models on a single GPU who need to ensure stable, automated handovers without manual intervention.

주요 기능

01Cross-service signaling protocol for polite model unload requests

025 GitHub stars

03Automatic model unloading for idle services to proactively free up VRAM

04Robust OOM exception handling with configurable retry logic and backoff delays

05Implementation templates for PyTorch, Transformers, and shell scripts

06Pre-configured optimization settings for Ollama, ComfyUI, and Flux

사용 사례

01Automating sequential AI pipelines where different tasks require full GPU access

02Running multiple local LLMs and image generators (e.g., Ollama + Flux) on a single workstation

03Optimizing shared GPU environments to prevent manual cache clearing and service restarts

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add lawless-m/claude-skills Vram-GPU-OOM-memory-management

For use in Claude.ai and ChatGPT

Download Skill