关于
This skill provides a streamlined interface for running SWE-bench Lite evaluations via the Model Context Protocol Benchmark Runner (mcpbr). It automates the benchmarking of MCP servers against real-world software engineering tasks, offering pre-configured defaults for sample size, reporting, and verbosity. It is an essential tool for developers building or refining AI agents and MCP servers who need quantifiable performance metrics, baseline comparisons, and detailed diagnostic logs to improve their model's problem-solving capabilities.