What AWS permissions are required?

You need IAM permissions for Bedrock AgentCore control and runtime operations, as well as CloudWatch permissions if you intend to set up proactive monitoring and alerts.

What are the 13 built-in evaluators included?

The evaluators cover Correctness, Helpfulness, Tool Selection Accuracy, Tool Parameter Accuracy, Safety, Faithfulness, Goal Success Rate, Context Relevance, Coherence, Conciseness, Stereotype Harm, Maliciousness, and Self-Harm.

Does this replace standard unit testing?

No, this skill is for evaluating AI behavior and output quality. You should still use tools like pytest or Jest for testing your underlying application code.

Can I create my own evaluation metrics?

Yes, you can define custom LLM-as-judge evaluators using models like Claude 3 Sonnet to measure domain-specific criteria such as brand tone, technical accuracy, or regulatory compliance.

Bedrock AgentCore Evaluations

Name: Bedrock AgentCore Evaluations
Author: adaptationio

byadaptationio

0•

安全与测试

Automates AI agent quality testing and monitoring using built-in metrics and custom LLM-as-judge evaluation patterns.

Bedrock AgentCore Evaluations enables developers to transition from subjective assessment to metric-based quality assurance for AI agents. This skill facilitates the implementation of 13 built-in evaluators—covering dimensions like correctness, safety, and tool selection accuracy—while also supporting custom LLM-as-judge scoring for domain-specific requirements. By integrating directly with Amazon Bedrock, it allows for rigorous pre-production validation and continuous production monitoring, ensuring agents remain reliable, helpful, and aligned with safety standards throughout their lifecycle.

主要功能

010 GitHub stars

02Pre-production batch testing for agent validation

03Custom LLM-as-judge patterns for domain-specific quality metrics

04Continuous production monitoring with CloudWatch integration

0513 Built-in evaluators including Correctness, Safety, and Helpfulness

06Detailed scoring for tool selection and parameter accuracy

使用场景

01Validating agent performance against test datasets before deployment

02Automating quality alerts and performance dashboards for AI workflows

03Monitoring live production interactions for hallucinations or safety violations

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add adaptationio/skrillz bedrock-agentcore-evaluations

For use in Claude.ai and ChatGPT

Download Skill