概要
This skill provides a comprehensive toolkit for validating LLM-based applications by enforcing industry-standard best practices like response mocking, quality metric evaluation, and asynchronous timeout handling. It enables developers to transition from flaky, expensive live API tests to deterministic unit and integration tests using frameworks like DeepEval and VCR.py. By automating the validation of structured outputs and RAG pipelines, it ensures that AI-driven features meet production-grade standards for accuracy, faithfulness, and reliability.