How does it help with API costs?

It includes specialized tools for precise token counting, cost estimation per model, and implementation patterns for prompt caching to reduce redundant expenses.

What does the LLM App Architecture skill do?

It provides standardized patterns and code snippets for building robust, production-ready applications that interface with LLM APIs like Claude and OpenAI.

Does it handle API rate limits?

Yes, it implements a decorator-based retry mechanism featuring exponential backoff and jitter to safely manage rate limits and transient connection errors.

Can I use this for real-time applications?

Absolutely. It includes specific patterns for streaming completions to web frameworks like FastAPI, enabling real-time token delivery to front-end interfaces.

Which models does this skill support?

While the examples focus on Anthropic's Claude, the architectural patterns such as async handling, retries, and streaming are applicable to any major LLM provider.

LLM Application Architecture

Name: LLM Application Architecture
Author: ricardoroche

byricardoroche

0•

Data Science & ML

Implements robust patterns for LLM-powered applications including async calls, streaming, token management, and resilient error handling.

This skill provides a comprehensive framework for building production-grade applications integrated with Large Language Models like Claude and GPT. It automates the implementation of essential architectural patterns such as asynchronous request handling to prevent blocking, real-time response streaming for better UX, and sophisticated retry logic with exponential backoff for handling API rate limits. Additionally, it offers tools for precise token counting, cost estimation, and response caching, ensuring that your AI-driven features are not only reliable and performant but also cost-efficient and observable.

Key Features

01Asynchronous LLM client implementation with comprehensive error handling

02Streaming response patterns for real-time UI/UX integration

03Efficient batch processing and prompt caching optimizations

040 GitHub stars

05Token management tools for usage tracking and cost estimation

06Resilient retry logic with exponential backoff and jitter

Use Cases

01Building high-performance AI chatbots with streaming capabilities

02Creating resilient backend services that gracefully handle model API downtime

03Developing cost-aware LLM agents that track token consumption in real-time

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add ricardoroche/ricardos-claude-code llm-app-architecture

For use in Claude.ai and ChatGPT

Download Skill