When is the 30-second duration rule applicable?

The shorter 30-second minimum is only permitted during binary search phases to help developers quickly narrow down the source of a regression.

Why does the skill enforce a 60-second minimum run?

Longer runs ensure statistical significance and filter out transient system noise, providing a much more accurate and reproducible performance baseline.

How does the skill capture performance data?

It parses the command output for specific PERF_METRICS_START and PERF_METRICS_END markers containing a JSON block of scenario data.

What is the purpose of the 10-second warmup?

The warmup period allows the execution environment, such as JIT compilers, caches, and memory management, to stabilize before actual metrics are recorded.

Can I run multiple benchmarks in parallel?

No, this skill strictly enforces sequential execution to prevent CPU and I/O contention from skewing your performance results.

Performance Benchmarker

Name: Performance Benchmarker
Author: agent-sh

byagent-sh

•

692

•

Analíticas y Monitorización

Automates high-precision performance benchmarking with strict execution rules to ensure reliable baselines and regression detection.

The Performance Benchmarker skill for Claude Code provides a robust framework for establishing performance baselines and validating regressions without the noise of parallel execution or short-run variance. It enforces a strict sequential execution rule, requires a minimum 10-second warmup period, and mandates a 60-second execution time for standard runs to ensure statistical significance. By standardizing output into a machine-readable JSON format using specific start and end markers, it allows Claude to reliably interpret performance data, identify anomalies, and manage re-runs during the optimization process.

Características Principales

01Automated anomaly detection and re-run management

02692 GitHub stars

03Standardized JSON metrics extraction via PERF_METRICS markers

04Strict 60-second minimum duration safety checks for standard benchmarks

05Sequential execution enforcement to prevent resource contention

06Mandatory 10-second warmup phase for stable measurements

Casos de Uso

01Conducting binary search optimizations to pinpoint bottleneck sources

02Establishing high-precision performance baselines for new features

03Validating performance regressions before merging critical code changes

Características Principales

01Automated anomaly detection and re-run management

02692 GitHub stars

03Standardized JSON metrics extraction via PERF_METRICS markers

04Strict 60-second minimum duration safety checks for standard benchmarks

05Sequential execution enforcement to prevent resource contention

06Mandatory 10-second warmup phase for stable measurements

Casos de Uso

01Conducting binary search optimizations to pinpoint bottleneck sources

02Establishing high-precision performance baselines for new features

03Validating performance regressions before merging critical code changes