关于
This skill integrates Evidently.ai into the Claude Code workflow to provide a robust framework for assessing and improving LLM performance. It enables developers to implement automated quality checks through text descriptors, set up sophisticated LLM-as-a-judge evaluators for qualitative metrics, and perform automated prompt tuning. Whether you're measuring RAG accuracy, comparing model variations, or monitoring production quality, this skill provides the necessary tools to transition from subjective assessment to data-driven LLM development within your Jupyter environment.