Provides automated procedures and standardized protocols for managing Cohere API service outages and performance degradations.
The Cohere Incident Runbook skill equips Claude with specialized knowledge and tools to handle service interruptions within Cohere's AI infrastructure. It streamlines the incident response lifecycle—from initial triage and endpoint health checks using status.cohere.com APIs to implementing immediate mitigations like API key rotation or circuit-breaking logic. By providing standardized communication templates and post-incident review structures, it ensures that developers can maintain service reliability and clear stakeholder communication during high-pressure system failures.
主要功能
01Built-in communication templates for internal Slack and external status updates
02Infrastructure-level mitigation commands for Kubernetes and local environments
03Evidence collection and automated postmortem generation tools
04Automated triage for Cohere Chat, Embed, and Rerank endpoints
05Standardized severity level mapping and decision trees for rapid response
062,028 GitHub stars
使用场景
01Responding to sudden 5xx errors or global API outages from Cohere
02Managing rate-limiting (429) issues by switching from trial to production keys
03Performing post-incident analysis to improve system resilience and fallback logic