Why should I use Ollama with Claude Code?

Using Ollama with Claude Code allows you to run powerful models locally, which drastically reduces API costs (up to 93%), ensures data privacy, and enables development in offline or air-gapped environments.

How do I switch between local and cloud models?

The skill provides a Provider Factory pattern that uses environment variables to automatically toggle between Ollama and cloud providers like OpenAI or Anthropic based on availability.

Can I use local models for embeddings?

Yes, the skill includes configurations for nomic-embed-text, which is a fast, local model for generating embeddings and performing vector searches without external API calls.

Which models are best for local coding tasks?

For coding, qwen2.5-coder:32b is highly recommended due to its high performance on coding benchmarks. For complex reasoning, deepseek-r1:70b provides GPT-4 level intelligence locally.

How do I optimize performance on Apple Silicon?

The skill recommends setting OLLAMA_MAX_LOADED_MODELS and using specific num_ctx and keep_alive settings to maximize the efficiency of the M-series Unified Memory Architecture.

Ollama Local LLM Inference

Name: Ollama Local LLM Inference
Author: yonatangross

byyonatangross

•

数据科学与机器学习

Integrates local LLM inference via Ollama to reduce API costs and enhance data privacy during development and CI/CD pipelines.

This skill empowers Claude Code to utilize local LLM inference through Ollama, providing a cost-effective and privacy-centric alternative to cloud-based APIs. It offers specialized implementation patterns for model selection (including DeepSeek and Llama), LangChain integration, and performance optimization specifically tuned for hardware like Apple Silicon. Whether you are automating CI/CD pipelines to achieve 93% cost savings, performing high-volume batch processing, or developing in offline environments, this skill ensures a seamless transition between local and cloud providers without sacrificing power or flexibility.

主要功能

01Performance-tuned configurations for Apple Silicon and self-hosted CI runners

028 GitHub stars

03Pre-warming and keep-alive strategies to minimize cold-start latency

04Provider Factory pattern for automatic fallback between local models and cloud APIs

05Seamless LangChain integration for local chat, embeddings, and tool calling

06Support for structured output using Pydantic and local model inference

使用场景

01High-volume batch processing for code analysis and embeddings without API rate limits

02Developing AI-powered applications with privacy-sensitive data in offline environments

03Reducing CI/CD operational costs by 93% using self-hosted local runners and Ollama

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add yonatangross/skillforge-claude-plugin ollama-local

For use in Claude.ai and ChatGPT

Download Skill