소개
Bedrock AgentCore Evaluations enables developers to transition from subjective assessment to metric-based quality assurance for AI agents. This skill facilitates the implementation of 13 built-in evaluators—covering dimensions like correctness, safety, and tool selection accuracy—while also supporting custom LLM-as-judge scoring for domain-specific requirements. By integrating directly with Amazon Bedrock, it allows for rigorous pre-production validation and continuous production monitoring, ensuring agents remain reliable, helpful, and aligned with safety standards throughout their lifecycle.