Inspect AI Evaluation Debugger FAQs

Question 1

What should I do if an evaluation is stuck on 'pending'?

Accepted Answer

This often indicates a malformed API response. You can usually restart the evaluation, and it will resume from the existing sample buffer without losing progress.

Question 2

How do I handle consistent 500 Internal Server errors?

Accepted Answer

You should download the sample buffer, find the specific failing request, and test it manually through the middleman proxy to see if the issue is reproducible at the infrastructure level.

Question 3

What does 'OOMKilled' mean in the pod status?

Accepted Answer

OOMKilled indicates the sandbox pod has exceeded its memory limit. To fix this, you need to increase the pod memory limits in your evaluation configuration.

Question 4

How can I tell if an error is coming from the API or the proxy?

Accepted Answer

The skill includes tools to test the API directly using curl via the middleman proxy. Comparing these results against direct provider calls helps isolate where the failure is occurring.

Question 5

Will restarting a failed evaluation lose my existing data?

Accepted Answer

No. Inspect uses a sample buffer in S3 that allows it to resume from the last successful sample, provided you do not use the --no-resume flag.

Inspect AI Evaluation Debugger

주요 기능

사용 사례

Inspect AI Evaluation Debugger

주요 기능

사용 사례