Which frameworks are supported?

The skill provides implementation patterns for Python (PyTorch/Transformers), ComfyUI, Flux, Ollama, and generic shell scripts.

How does the signaling protocol improve performance?

It allows a service to proactively request other services to unload their models if they are idle, leading to faster startup times compared to waiting for OOM retries.

What is the OOM retry pattern?

It is a strategy where services catch GPU Out-of-Memory errors, wait for a specified duration to allow other services to auto-unload, and then retry the loading operation.

Can I use this with Ollama?

Yes, the skill includes specific instructions for configuring the OLLAMA_KEEP_ALIVE environment variable to ensure models unload quickly after use.

Is manual coordination required between services?

No, the signaling protocol and retry logic are designed to automate memory sharing without requiring manual human oversight.

GPU VRAM Management & OOM Recovery

Name: GPU VRAM Management & OOM Recovery
Author: lawless-m

bylawless-m

0•

データサイエンスとML

Optimizes GPU memory usage across multiple AI services by implementing automated VRAM management, retry logic, and inter-service signaling.

This skill provides robust patterns for orchestrating GPU-intensive services like Ollama, Whisper, and ComfyUI on shared hardware. It enables automated Out-of-Memory (OOM) recovery through intelligent retry loops, implements idle-based auto-unloading, and offers a lightweight signaling protocol for services to politely request memory from one another. This is particularly useful for developers and AI researchers running complex local environments where multiple models must compete for limited VRAM without manual intervention.

主な機能

01Idle-based auto-unloading patterns to free VRAM when models are not in use.

02Ollama-specific configuration guides for optimized memory residency.

03Ready-to-use implementation templates for PyTorch, Transformers, ComfyUI, and Flux.

040 GitHub stars

05Automated GPU OOM error detection and retry logic with configurable delays.

06Cross-service signaling protocol using REST endpoints to coordinate resource sharing.

ユースケース

01Running multiple LLMs and Image generators simultaneously on a single GPU workstation.

02Building resilient AI pipelines that recover gracefully from shared hardware resource contention.

03Coordinating resource-heavy OCR tasks alongside background inference processes.

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add lawless-m/gwen Vram-GPU-OOM-memory-management

For use in Claude.ai and ChatGPT

Download Skill