How do I verify my authentication for Hawk?

Use the command 'hawk auth access-token > /dev/null || echo "Run hawk login first"' to check your current session status.

What should I do if I see an OOMKilled status?

This indicates memory exhaustion in the runner pod. You will need to increase the pod memory limits in your task configuration.

How can I resume a stuck evaluation without losing data?

Delete the stuck evaluation with 'hawk delete ' and restart it. Inspect uses S3 buffers to resume progress automatically unless --no-resume is specified.

What if the API returns a 500 error?

Use the direct curl commands provided by this skill to test the connection through the middleman proxy versus the direct provider to isolate the failure point.

Hawk Eval Debugger

Name: Hawk Eval Debugger
Author: METR

byMETR

•

分析与监控

Diagnoses and resolves hanging or failed evaluations in METR Hawk and UK AISI Inspect AI environments.

The debug-stuck-eval skill provides specialized troubleshooting guidance for AI evaluations that have stalled, timed out, or encountered persistent errors. It equips developers with specific commands to verify authentication, monitor pod states, analyze error patterns like OOMKilled or API retries, and perform direct API testing through the middleman proxy. This tool is essential for teams running complex AI safety and capability benchmarks who need to ensure their evaluation pipelines are reliable and efficient without losing progress.

主要功能

0124 GitHub stars

02Automated error pattern identification for Inspect AI logs

03Comprehensive status checks for evaluation sets and pod health

04S3 buffer access instructions for data recovery and resumption

05Step-by-step recovery workflows for restarting stuck evaluations

06Direct API connectivity testing via Middleman auth proxy

使用场景

01Diagnosing 500 Internal Server errors and token limit issues in model API calls

02Investigating memory-related crashes and pod restarts in evaluation environments

03Troubleshooting 'eval not progressing' or 'samples not completing' errors

主要功能

0124 GitHub stars

02Automated error pattern identification for Inspect AI logs

03Comprehensive status checks for evaluation sets and pod health

04S3 buffer access instructions for data recovery and resumption

05Step-by-step recovery workflows for restarting stuck evaluations

06Direct API connectivity testing via Middleman auth proxy

使用场景

01Diagnosing 500 Internal Server errors and token limit issues in model API calls

02Investigating memory-related crashes and pod restarts in evaluation environments

03Troubleshooting 'eval not progressing' or 'samples not completing' errors