Can I customize the rate limits for my specific plan?

Yes, the provided implementation allows you to initialize limiters with custom RPM and TPM values to match your specific Mistral subscription tier.

Can I use this for both Chat and Embedding APIs?

Yes, it includes specific implementation patterns and code snippets for both chat completions and batch embedding requests with rate-limit awareness.

How does this skill handle Mistral 429 errors?

It implements a robust retry mechanism using exponential backoff to automatically wait and retry requests when the Mistral API returns a 'Too Many Requests' status.

How does the token-aware limiter work?

It tracks the estimated and actual number of tokens used within a sliding 60-second window to ensure your application doesn't exceed the Tokens Per Minute (TPM) limit.

Does it support different Mistral model tiers?

Absolutely. It provides pre-configured limits for Mistral Small, Medium, Large, and Embed models based on their respective API tier specifications.

Mistral AI Rate Limit Manager

Name: Mistral AI Rate Limit Manager
Author: jeremylongshore

byjeremylongshore

•

1,613

•

API Development

Implements robust rate limiting, exponential backoff, and request throughput optimization for Mistral AI API integrations.

This skill provides Claude Code with specialized patterns for managing Mistral AI's API constraints, ensuring applications remain stable and performant. It includes comprehensive logic for tracking both Requests Per Minute (RPM) and Tokens Per Minute (TPM) across different model tiers including Small, Medium, Large, and Embed. By implementing token-aware rate limiters, handling HTTP 429 'Too Many Requests' errors with exponential backoff, and providing intelligent model-tier routing, it prevents service interruptions and optimizes throughput for developers building Mistral-powered applications.

Key Features

01Utilization Metrics: Generates status reports to monitor current API usage against tier limits.

02Exponential Backoff: Automatically handles 429 errors with intelligent retry logic to minimize downtime.

03Batch Optimization: Safely manages large embedding or chat completion batches with rate awareness.

04Dual-Constraint Tracking: Monitors both Requests Per Minute (RPM) and Tokens Per Minute (TPM).

05Model-Tier Routing: Dynamically switches between Mistral models based on current capacity and rate limits.

061,613 GitHub stars

Use Cases

01Resolving frequent 'Too Many Requests' errors in Claude Code development projects.

02Building high-throughput production applications using Mistral AI models.

03Implementing cost-effective model fallback strategies when premium tier limits are reached.

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add jeremylongshore/claude-code-plugins-plus-skills mistral-rate-limits

For use in Claude.ai and ChatGPT

Download Skill