Which port does Llamafile use by default?

Llamafile defaults to port 8080. This skill helps you manage port configurations and troubleshoot common connection issues associated with local servers.

Llamafile is a cross-platform executable format developed by Mozilla that allows you to run Large Language Models locally using a single file, requiring no cloud dependencies.

Can I use this with the OpenAI Python SDK?

Absolutely. Llamafile serves an OpenAI-compatible API at the /v1 endpoint, allowing you to use existing OpenAI SDKs by simply updating the base URL.

Is an internet connection required to use Llamafile?

Only for the initial download of the binary and model files. Once installed, Llamafile runs entirely offline, making it ideal for air-gapped or privacy-sensitive projects.

Does this skill support GPU acceleration?

Yes, it provides specific configuration patterns for CUDA, Metal, and Vulkan to offload model layers to your GPU for significantly faster inference.

Llamafile Local LLM Manager

Name: Llamafile Local LLM Manager
Author: BbgnsurfTech

byBbgnsurfTech

•

Data Science & ML

Configures and manages Mozilla Llamafile to run high-performance GGUF models locally with an OpenAI-compatible API.

This skill enables Claude to manage local LLM deployments using Mozilla Llamafile, a cross-platform format for running Large Language Models without cloud dependencies. It provides comprehensive guidance for installing binaries, selecting optimized GGUF models, and configuring servers with GPU acceleration for CUDA, Metal, or Vulkan. Whether building air-gapped tools, troubleshooting server connections, or integrating local inference into developer workflows via LiteLLM or the OpenAI SDK, this skill ensures a seamless, offline-first AI environment.

Key Features

01Integrated health monitoring and background process management

02Performance optimization for CPU/GPU thread and layer allocation

033 GitHub stars

04Cross-platform GGUF model execution via Mozilla Llamafile

05OpenAI-compatible HTTP API server configuration and management

06GPU acceleration support for CUDA, Metal, and Vulkan backends

Use Cases

01Reducing API costs by routing simple tasks to local GGUF models

02Building private code review or commit message automation tools

03Setting up air-gapped or offline AI development environments

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add bbgnsurftech/claude-skills-collection llamafile

For use in Claude.ai and ChatGPT

Download Skill