Troubleshoots and resolves stuck or failing UK AISI Inspect AI evaluations on the Hawk platform.
The debug-stuck-eval skill provides a specialized diagnostic toolkit for researchers and developers using the UK AISI Inspect AI framework. It streamlines the process of identifying why evaluations hang or fail by analyzing pod states, log patterns, and sample completion status. The skill guides users through verifying authentication, checking retry loops, testing API connectivity through the Middleman proxy, and implementing recovery steps like S3 buffer-aware restarts to ensure evaluation continuity without data loss.
주요 기능
01Sample-level progress tracking to identify malformed responses
02Detection of common error patterns including OOMKilled pods and API retries
03Direct API connectivity testing via Middleman and provider endpoints
04Step-by-step recovery workflows for restarting stuck evaluations
0524 GitHub stars
06Automated status and log analysis for Hawk/Inspect evaluation sets
사용 사례
01Investigating high retry counts and latency in long-running tasks
02Diagnosing why an AI evaluation set is frozen or not progressing
03Troubleshooting 500 Internal Server errors in LLM API requests