What kind of risks does it flag during the design phase?

It identifies common pitfalls such as novelty effects, seasonal confounds, multiple testing issues, and network effects that could invalidate your results.

How does Experiment Designer calculate the required sample size?

It uses your provided Minimum Detectable Effect (MDE), current baseline metric, and available daily traffic to determine the 'n' per variant needed for statistical power.

Can this skill help if my experiment results are inconclusive?

Yes, it investigates confounding factors and confidence intervals to recommend whether you should iterate on the hypothesis or run a follow-up test.

Does it account for the 'peeking problem'?

Yes, the results interpretation phase includes a validation step to confirm the test ran for the full duration and flags results if the test was stopped early.

Experiment Designer

Name: Experiment Designer
Author: mohitagw15856

bymohitagw15856

•

295

•

データサイエンスとML

Designs statistically rigorous A/B tests and interprets experiment results to drive data-driven product decisions.

The Experiment Designer skill empowers product teams to transition from gut-feeling hypotheses to data-backed decisions by automating the creation of rigorous experiment frameworks. It assists in calculating required sample sizes, estimating run times, and identifying potential design risks like novelty effects or sample ratio mismatches. Beyond initial design, the skill interprets complex statistical results, distinguishing between statistical and practical significance to provide clear, defensible recommendations on whether to ship, iterate, or kill a feature.

主な機能

01Statistical and practical significance assessment for raw test results

02Automated sample size and run time calculations based on MDE and baseline metrics

03295 GitHub stars

04Standardized output for 'Ship, Iterate, or Kill' decision frameworks

05Comprehensive risk flagging for novelty effects, seasonal confounds, and peeking problems

06Structured hypothesis generation focusing on specific changes and measurable outcomes

ユースケース

01Planning a new feature rollout to determine the necessary traffic and duration for a valid test

02Interpreting A/B test data to defend product decisions to engineering leads and data scientists

03Validating existing experiment results for integrity issues like sample ratio mismatch (SRM)

主な機能

01Statistical and practical significance assessment for raw test results

02Automated sample size and run time calculations based on MDE and baseline metrics

03295 GitHub stars

04Standardized output for 'Ship, Iterate, or Kill' decision frameworks

05Comprehensive risk flagging for novelty effects, seasonal confounds, and peeking problems

06Structured hypothesis generation focusing on specific changes and measurable outcomes

ユースケース

01Planning a new feature rollout to determine the necessary traffic and duration for a valid test

02Interpreting A/B test data to defend product decisions to engineering leads and data scientists

03Validating existing experiment results for integrity issues like sample ratio mismatch (SRM)