关于
The Promptfoo Evaluation skill enables developers to systematically test, compare, and refine LLM prompts within their Claude Code environment. By integrating the open-source Promptfoo CLI, this skill assists in configuring evaluation matrices, defining custom Python assertions, and implementing LLM-as-judge rubrics to ensure high-quality, consistent model outputs. It is particularly useful for teams needing to benchmark different models, validate few-shot examples, or monitor response quality across complex prompt iterations before production deployment.