About
The ML Training Debugger is a specialized skill for Claude Code designed to automate the diagnostic process for problematic machine learning training runs. It spawns a specialist agent to systematically analyze training logs, loss curves, model architecture, and gradient statistics to pinpoint the root causes of issues such as exploding gradients, mode collapse, or architecture imbalances. By providing evidence-based recommendations and actionable fixes, this skill helps researchers and engineers stabilize training pipelines and reduce the time spent on manual trial-and-error debugging.