01Error categorization using H&K frameworks for precise failure diagnosis
02Zero-shot evaluation metrics mapping against predefined strategy targets
03Detailed behavioral test analysis for JSON validity and classification accuracy
041 GitHub stars
05Automated context gathering from R targets and YAML codebooks
06Ablation ranking to identify the most impactful codebook components