Diagnoses and resolves hanging or failing UK AISI Inspect AI evaluations running in the Hawk cloud environment.
The Inspect AI Eval Debugger skill provides a specialized diagnostic framework for troubleshooting stalled or failing model evaluations within the UK AISI Inspect ecosystem. It enables Claude to perform deep-dive analysis into Hawk evaluation sets by verifying authentication, monitoring pod health, and inspecting real-time logs for specific error signatures like OOMKilled events, 500 internal server errors, and API retry loops. By combining log analysis with direct API connectivity testing through middleman proxies, this skill helps developers identify whether issues stem from the provider, the proxy, or the evaluation configuration, while providing clear recovery paths to resume progress without data loss.
Key Features
01Streamlined log retrieval including support for follow-mode and S3 buffer access
02Real-time evaluation status monitoring and pod state analysis
03Automated identification of common error patterns like OOMKilled and rate limits
04Integrated API testing via curl to isolate middleman and provider issues
05Guided recovery workflows to delete and restart stuck evaluation sets with resume support
0624 GitHub stars
Use Cases
01Analyzing memory exhaustion and pod restarts in cloud-based Inspect runners
02Troubleshooting evaluations that are frozen or stuck in a retry loop
03Debugging 500 errors and API timeouts in large-scale model testing runs