Diagnoses and resolves hanging or failing AI model evaluations using Hawk and the UK AISI Inspect framework.
The Inspect AI Evaluation Debugger is a specialized skill designed to troubleshoot stalled or failing model evaluation runs in the cloud. It provides a structured diagnostic workflow for the Hawk platform, enabling users to verify authentication, monitor evaluation status, and analyze logs for specific error patterns like rate limits, OOM errors, and API proxy failures. By facilitating direct API testing and providing instructions for buffer-based recovery, this skill helps developers ensure that long-running AI evaluations reach completion even when faced with infrastructure instability or provider-side errors.
주요 기능
01Automated log pattern matching for 500 errors and timeout detection
02Real-time evaluation status monitoring and pod state analysis
03Memory exhaustion (OOMKilled) diagnostic and resolution guidance
04Evaluation resumption and recovery management using S3 buffer synchronization
0524 GitHub stars
06Direct API proxy testing via Middleman to isolate connectivity issues
사용 사례
01Troubleshooting evaluations that are stuck in retry loops or hanging indefinitely
02Diagnosing 500 Internal Server errors during large-scale model testing runs
03Verifying whether evaluation failures are caused by proxy issues or provider rate limits