# Sample Size Calculation

How do we decide on how long a test should run, or in our terms, how many observations do we need per group? This question is relevant because it's normally advised that you decide on a sample size before you start an experiment.  While many A/B testing guides attempt to provide general advice, the reality is that it varies case by case. A common approach for overcoming this problem referred to as the power analysis.&#x20;

## Power Analysis

We perform power analysis to generate needed sample size, and the it includes the following metrics:

1. Effect size (calculated via lift): the minimum size of the effect that we want to detect in a test; for example, a 5% increase in conversion rates.&#x20;

   1. For testing the differences in `means`, after selecting the suitable minimum detectable effect(MDE) of interest, we convert it into a standardized effect size known as `Cohen's d` defined as the difference between the two means divided by the standard deviation:

      Cohen 's d = (µB -µA) / stdev\_pooled
   2. For differences in proportions, a common effect size to use is `Cohen's h` calculated using the  formula:

      Cohen' s h = 2 arcsin (sqrt(p1)) - 2 arcsin (sqrt(p2))

   A general rule of thumb:&#x20;

   * 0.2 corresponds to a small effect,&#x20;
   * 0.5 is a medium effect,&#x20;
   * 0.8 is large.&#x20;

2. Significance Level (predetermined): Alpha value; 5% is typical.

3. Power (predetermined): Probability of detecting an effect

Keep in mind that if we change any of the above metrics, the needed Sample size also changes.

More power, a smaller significance level, or detecting a smaller effect all lead to a larger sample size.&#x20;

```python
"""
The power functions require standardized minimum effect difference. 
To get this, we can use the proportion_effectsize function by inputting our baseline 
and desired minimum conversion rates.
"""
from statsmodels.stats.proportion import proportion_effectsize
from statsmodels.stats.power import zt_ind_solve_power

# calculate standardized minimum effect difference. 
std_effect = proportion_effectsize(0.15, 0.1)

# calculate sample size with alpha=.05, and varying powers
sz1 = zt_ind_solve_power(effect_size=std_effect, nobs1=None, alpha=.05, power=.80)
sz2 = zt_ind_solve_power(effect_size=std_effect, nobs1=None, alpha=.05, power=.90)

print(f"{sz1:.2f}")
print(f"{sz2:.2f}")

"""
680.35
910.80

Note that increasing Power required more samples.
"""
```

## Effect Size, Sample Size and the Power

Below is the Power of Test graph with varying sample and effect sizes:

<figure><img src="https://483934582-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F332OmkaCCBc9TZFZXfnO%2Fuploads%2FijxeWeeWMaApWij205Ml%2Fpower.png?alt=media&#x26;token=80124313-d460-427b-81ae-dadb01c4eee0" alt=""><figcaption></figcaption></figure>

Code to produce the image above:

```python
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.power import TTestIndPower

# Sample Size and Effect Size
sample_sizes = np.array(range(5, 100))
effect_sizes = np.array([0.2, 0.5, 0.8])

# Create results object for t-test analysis
res = TTestIndPower()

# Plot the power analysis
res.plot_power(dep_var='nobs', nobs=sample_sizes, effect_size=effect_sizes)
plt.show()
```
