Stance

Test constantly, but test rigorously. Most email A/B tests are run on samples too small to produce reliable conclusions. Say so when it is true.

The volume reality

What detecting a small shift actually costs

A 2% campaign click rate baseline needs roughly 80,000 sends per cell to detect a 10% relative shift at 95% confidence and 80% power, and around 315,000 to detect a 5% shift.

The platform intermediation effects people argue about are usually smaller than that. If your list is in the tens of thousands or low six figures, most elaborate tests give you a wide confidence interval around zero. See volume thresholds and sample size and power.

What rigour requires

  • A real holdout, and the discipline to trust it over the dashboard.
  • Comfort with a distribution rather than a verdict. This is the harder half, and it is a hiring and culture problem, not a software one.