关于
This skill streamlines the process of evaluating Large Language Model (LLM) outputs by integrating the Promptfoo framework directly into the development workflow. It enables developers to perform automated regression testing, compare prompt versions, and conduct security red-teaming to detect jailbreaks or PII leaks. By providing a structured way to implement CI/CD quality gates, it ensures that only high-quality, safe, and cost-effective prompts are deployed to production, moving beyond subjective evaluations to objective, data-driven metrics for RAG systems and standalone prompts.