01Utilization Metrics: Generates status reports to monitor current API usage against tier limits.
02Exponential Backoff: Automatically handles 429 errors with intelligent retry logic to minimize downtime.
03Batch Optimization: Safely manages large embedding or chat completion batches with rate awareness.
04Dual-Constraint Tracking: Monitors both Requests Per Minute (RPM) and Tokens Per Minute (TPM).
05Model-Tier Routing: Dynamically switches between Mistral models based on current capacity and rate limits.
061,613 GitHub stars