概要
This skill provides a comprehensive framework for assessing LLM outputs, ensuring production readiness through standardized quality gates and automated assessment pipelines. It implements industry-standard evaluation patterns like LLM-as-judge using cost-effective models, RAGAS metrics for RAG system validation, and sophisticated hallucination detection. By integrating these tools directly into your development workflow, you can automate quality assurance, run batch evaluations over golden datasets, and maintain high standards for AI-driven features.