Quick and Dirty Sample Size Calculation

Sample Size Calculation for a Two-Sample Z-Test

Under the Null Hypothesis

Under the Alternative Hypothesis

$$\frac{r-\delta}{\sigma \sqrt{\frac{2}{n}}} = -z_{1-\beta}$$

Final Step

$$n = \frac{2\sigma^2}{\delta^2} (z_{1-\alpha/2} + z_{1-\beta})^2$$

$$n \approx \frac{16\sigma^2}{\delta^2}$$

Examples

These examples are from Larsen et al. (2024)

These examples illustrate the crucial caveat: “while the sample size may be large enough to assume normality, it is not always sufficient to achieve the desired statistical power.”

To analyze both scenarios, we use our derived rule of thumb:

$$n \approx \frac{16\sigma^2}{\delta^2}$$

The Common Parameters

Before splitting into the two scenarios, let’s identify the constants shared by both:


Scenario 1: The Small Startup

The goal is to detect a $5\%$ relative change in average revenue.


Scenario 2: The Massive Enterprise

The goal is to detect a $0.02\%$ relative change in average revenue (which represents a material $$10$ million shift for a $$50$ billion company).

The Core Takeaway: The Quadratic Penalty

These examples highlight the mathematical reality of the $\delta^2$ term in the denominator.

Because the Minimum Detectable Effect ($\delta$) is squared, sample size requirements do not scale linearly; they scale quadratically. If you want to detect an effect that is $250$ times smaller (dropping from $5\%$ to $0.02\%$), you do not need $250$ times as many users—you need $250^2$ ($62,500$) times as many users.

Note that change always result in negative impact. Therefore big companies cannot risk big changes. Therefore the new features that engineering team make are usually small changes. And the data scientists should detect small changes because that small change is big enough to move millions of dollors.

This demonstrates why massive tech companies cannot simply rely on standard A/B testing to detect highly impactful but relatively tiny changes in conversion or revenue, forcing them to look beyond standard two-sample z-tests toward variance reduction techniques (like CUPED) or alternative experimental designs.

References