When should I use the 1-hour extended cache TTL?

The 1-hour TTL is recommended when a prompt is reused more than 10 times per hour or contains more than 10,000 tokens, maximizing long-term savings despite a higher write cost.

How much can I save on API costs?

Users can achieve up to 90% savings on input token costs for cached reads, typically breaking even after just two requests using a 5-minute cache.

Which models support prompt caching with this skill?

The skill supports the latest Claude models including Opus 4/4.1, Sonnet 3.7/4/4.5, and Haiku 3.5/4.5, as well as OpenAI's gpt-4o and o1 series.

Does this require special configuration for OpenAI?

OpenAI caches prefixes automatically, so this skill focuses on structuring your prompts to maximize hits and monitoring usage through response metadata.

Prompt Caching & Cost Optimizer

Name: Prompt Caching & Cost Optimizer
Author: yonatangross

byyonatangross

•

Desarrollo de API

Optimizes LLM performance and reduces API costs by implementing provider-native prompt caching for Claude and OpenAI models.

Acerca de

This skill empowers developers to drastically lower LLM operational costs and latency by leveraging native prompt caching mechanisms. It provides production-ready implementation patterns for setting ephemeral cache breakpoints in Anthropic's Claude API and optimizing automatic caching for OpenAI’s gpt-4o and o1 models. By strategically caching stable components like system prompts, tool definitions, and few-shot examples, users can achieve up to 90% savings on input tokens, making it an essential tool for high-frequency AI applications and long-context processing.

Características Principales

Best-practice ordering for tools, system messages, and user content to prevent cache invalidation
Support for extended 1-hour TTL strategies to maximize savings on high-frequency requests
OpenAI automatic caching optimization and usage monitoring
29 GitHub stars
Native Anthropic cache breakpoint implementation for system prompts and few-shot data
Built-in cost-benefit analysis and break-even calculation models

Casos de Uso

Optimizing repetitive few-shot prompting pipelines for data extraction
Improving response times for long-context interactive chat applications
Reducing costs for RAG systems with large, stable context and system instructions

Acerca de

Características Principales

Best-practice ordering for tools, system messages, and user content to prevent cache invalidation
Support for extended 1-hour TTL strategies to maximize savings on high-frequency requests
OpenAI automatic caching optimization and usage monitoring
29 GitHub stars
Native Anthropic cache breakpoint implementation for system prompts and few-shot data
Built-in cost-benefit analysis and break-even calculation models

Casos de Uso

Optimizing repetitive few-shot prompting pipelines for data extraction
Improving response times for long-context interactive chat applications
Reducing costs for RAG systems with large, stable context and system instructions