Diagnoses and resolves hanging or failing Inspect AI evaluations within Hawk cloud environments.
The debug-stuck-eval skill provides a specialized diagnostic framework for developers running UK AISI's Inspect AI framework in the cloud. It streamlines the troubleshooting process for evaluations that have stalled, timed out, or returned persistent 500 errors by providing structured workflows for checking pod health, analyzing log retry patterns, and testing API connectivity through middleman proxies. Whether dealing with OOM errors, token limit issues, or malformed API responses, this skill offers the specific Hawk CLI commands and curl tests needed to identify the root cause and safely resume evaluation runs from S3 buffers.
Características Principales
01Step-by-step diagnostic checklist for Hawk cloud authentication and pod status
0224 GitHub stars
03Connectivity testing scripts for Middleman proxies and direct model providers
04Safe recovery procedures to restart stuck evaluations using S3 buffer resumes
05Resource monitoring guidance to identify OOMKilled pods and memory exhaustion
06Log pattern identification for OpenAI SDK retries and Inspect-specific errors
Casos de Uso
01Troubleshooting AI evaluations that are hanging or frozen at specific sample counts
02Identifying if a bottleneck is caused by the model provider, proxy, or cloud infrastructure
03Investigating 500 Internal Server errors and API instability during large-scale runs