Diagnoses and resolves hanging or failed evaluations in METR Hawk and UK AISI Inspect AI environments.
The debug-stuck-eval skill provides specialized troubleshooting guidance for AI evaluations that have stalled, timed out, or encountered persistent errors. It equips developers with specific commands to verify authentication, monitor pod states, analyze error patterns like OOMKilled or API retries, and perform direct API testing through the middleman proxy. This tool is essential for teams running complex AI safety and capability benchmarks who need to ensure their evaluation pipelines are reliable and efficient without losing progress.
主要功能
0124 GitHub stars
02Automated error pattern identification for Inspect AI logs
03Comprehensive status checks for evaluation sets and pod health
04S3 buffer access instructions for data recovery and resumption
05Step-by-step recovery workflows for restarting stuck evaluations
06Direct API connectivity testing via Middleman auth proxy
使用场景
01Diagnosing 500 Internal Server errors and token limit issues in model API calls
02Investigating memory-related crashes and pod restarts in evaluation environments
03Troubleshooting 'eval not progressing' or 'samples not completing' errors