소개
The MCP Benchmark Runner skill enables Claude to perform rigorous, automated evaluations of MCP servers by interfacing with the mcpbr CLI. It streamlines the benchmarking process for AI agents across diverse datasets, including SWE-bench for real-world bug fixes, CyberGym for security vulnerability exploits, and MCPToolBench++ for tool-use proficiency. By managing environment prerequisites, validating YAML configurations, and generating detailed Markdown reports, this skill provides a standardized framework for measuring the reliability and performance of AI tools and agentic workflows.