Multiple Testing

When conducting multiple tests, we must adjust the p-values to account for the number of hypotheses being tested, in order to control the Type I Error rate. Although there is no universally-accepted solution for multiple testing, some common ones are

  • Bonferroni correction

  • Sidak correction

  • Step-based approach

  • Tukey’s procedure

  • Dunnet’s correction

Bonferroni Correction

The Bonferroni correction is the most conservative and straightforward approach among all the adjustments:

Bonferroni-corrected alpha = alpha / m
where 
alpha: desired/initial significance level
m: number of hypothesis tests

Ex: if we have 20 tests with initial/desired alpha = 0.05, then the Bonferroni correction would test each test at alpha_corrected = 0.05 / 20 = 0.0025 significance level.

Type I Error Rate and Bonferroni Correction

Rate = 1 - [ (1 - significance level) ^ number of tests ]

sign_level = 0.05
num_test = [10,30,60]
for m in num_test:
    err_rate = 1 - ((1-sign_level)**m)
    print(f"Error Rate for {m} Tests: {round(err_rate,4)}")
"""    
Error Rate for 10 Tests: 0.4013
Error Rate for 30 Tests: 0.7854
Error Rate for 60 Tests: 0.9539
"""

The probability of encountering an error remains significantly high. This is where the Bonferroni correction plays a crucial role. Although slightly conservative, it effectively controls the family-wise error rate, helping to mitigate the risk of a Type I error.

# Bonferroni with Statsmodel
from statsmodels.sandbox.stats.multicomp import multipletests
pvals = [.01, .05, .10, .50, .99]
p_adjusted = multipletests(pvals, alpha=.05, method='bonferroni')

print(p_adjusted[0]) # [True False False False False]
print(p_adjusted[1]) # [0.05 0.25 0.5 1. 1.]

The Bonferroni correction effectively adjusted the family-wise error rate for five hypothesis tests. Ultimately, only one test remained significant.

Last updated