Optimizes LLM API expenditures by implementing intelligent model routing, budget guardrails, and efficient prompt caching strategies.
This skill provides a comprehensive framework for managing LLM API costs during application development. It introduces a reusable pipeline pattern that automatically selects the most cost-effective model based on task complexity—routing simpler tasks to Haiku and complex ones to Sonnet—while tracking cumulative spending via immutable data structures. By integrating granular retry logic for transient errors and automated prompt caching for long system messages, it ensures that developers can maintain high-quality AI features without exceeding budget constraints or wasting tokens on redundant processing.
主な機能
01Integrated prompt caching for long system instructions
02Smart retry logic limited to transient API failures
03Immutable budget tracking with automatic spending guardrails
04323 GitHub stars
05Complexity-based model routing (e.g., Haiku vs. Sonnet)
06Standardized pricing reference for multi-model orchestration
ユースケース
01Multi-model architectures balancing speed, cost, and intelligence
02Production applications requiring strict monthly API budget limits
03Batch processing large datasets where token costs accumulate rapidly