How does the skill handle session safety?

It implements guardrails that check the worker terminal's status to ensure it is idle before sending new commands or attempting to restart a session.

What model formats does this skill support?

This skill is specifically designed to work with GGUF models, which can be loaded from local storage or downloaded directly from Hugging Face.

Can I run models in an interactive chat mode?

Absolutely. The skill provides commands for llama-cli using the conversation flag and specific chat templates like ChatML.

Can I host a local API with this skill?

Yes, the skill uses the llama-server command to host an OpenAI-compatible API on a specified local port, such as 8080.

Does it support high-performance decoding?

Yes, it includes parameters for parallel decoding and context size management to optimize performance on local hardware.

Llama.cpp Runtime Controller

Name: Llama.cpp Runtime Controller
Author: dickymoore

bydickymoore

0•

Ciencia de Datos y ML

Manages local GGUF model inference and API serving through llama.cpp within worker terminals.

The Llama.cpp Runtime skill empowers Claude to operate local LLMs with precision, utilizing the llama.cpp ecosystem for both interactive local inference and background API hosting. It provides standardized commands for llama-cli and llama-server, allowing for seamless model downloading from Hugging Face, configuration of chat templates, and parallel decoding setup. Designed with safety in mind, it includes built-in guardrails to verify session status and prevent mid-run interruptions, making it an essential tool for developers orchestrating local AI workloads or multi-agent systems.

Características Principales

01Configurable parallel decoding and context management

02Direct model downloads and execution from Hugging Face

03Interactive GGUF model execution via llama-cli

040 GitHub stars

05OpenAI-compatible API hosting with llama-server

06Session safety monitoring and idle-state verification

Casos de Uso

01Automating the setup and testing of local GGUF models

02Hosting private local API endpoints for multi-agent workflows

03Developing and benchmarking LLM applications on local hardware

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add dickymoore/macs llama-cpp-runtime

For use in Claude.ai and ChatGPT