Diagnoses and resolves stalled, hanging, or failing AI evaluations within the UK AISI Inspect framework.
The Inspect AI Evaluation Debugger is a specialized diagnostic tool for researchers and engineers running UK AISI Inspect evaluations in cloud environments. It provides a structured methodology for identifying why evaluations are hanging or failing, covering everything from authentication verification and pod log analysis to deep-dive API connectivity testing. By recognizing specific error patterns like OOMKills, token limit exceeded errors, and Middleman proxy issues, this skill enables users to quickly recover stuck runs using S3 buffer resumes and targeted infrastructure fixes.
주요 기능
01Automated status and log analysis for Hawk/Inspect AI evaluations
02Workflow guidance for S3 buffer access and .eval log extraction
03Detection of high HTTP retry counts indicating API instability
0424 GitHub stars
05Direct API connectivity testing through Middleman and model provider endpoints
06Pattern matching for common errors like OOMKills and Pod UID mismatches
사용 사례
01Diagnosing 500 Internal Server errors during large-scale model testing
02Troubleshooting 'stuck' or 'frozen' evaluations that aren't progressing
03Recovering and resuming evaluations after pod memory exhaustion