In simple terms, a Confidence Interval represents range of values that we are fairly sure contains the true value of an unknown population parameter. It has an associated confidence level which indicates how often the interval would included this value if the process were repeated. For example, if we have 95% confidence level, it implies means that 95 times out of 100 cases, we can expect the interval to capture the true population parameter.
1. Calculating Confidence Intervals
1.1. Mean
For means, we take the sample mean then add and subtract the appropriate z-score (when σ is known or with Large Sample Size (n>30), or t-score when sigma is unknown) for our confidence level with the population standard deviation over the square root of the number of samples.
When σ is Known
The equation is simply tells us that the Confidence Interval is centered at the sample mean x_hat
extends 1.96 to each side of x_hat.
When σ is Unknown and Sample Size n ≥ 30:
We first calculate the sample standard deviation:
Then, compute the confidence interval:
When σ is Unknown and Sample Size n < 30:
We rely on Student's t-distribution:
Example in Python
import scipy.stats as stimport numpy as np# Method 1 - Manual# Sample data with n=10 (1 to 10)n=10a =range(1,n+1)# Meanm = np.mean(a)# Standard deviations = np.std(a,ddof=1)# Critical t-score since the sample size 10 with alpha = 0.05alpha =0.05dof = n-1t_crit = st.t.ppf(1-alpha/2, dof)# Confidence intervalci_manual = (m - t_crit * s / np.sqrt(n), m + t_crit * s / np.sqrt(n)) # s / np.sqrt(n) is called the standard error of the meanprint(ci_manual)# 3.3341494102783162, 7.665850589721684)# Method 2 - Using scipy's interval methodci_scipy = st.t.interval(1-alpha, dof, loc=m, scale = st.sem(a))print(ci_scipy)# (3.3341494102783162, 7.665850589721684)
1.2. Proportions
For proportions, we take the sample proportion add subtract the z score times the square root of the sample proportion times its complement, over the number of samples.
Example in Python
import numpy as npimport scipy.stats as stfrom statsmodels.stats.proportion import proportion_confint# Sample datan =100# Sample sizex =60# Number of successes (favorable responses)p_hat = x / n # Sample proportion# Confidence levelalpha =0.05# 95% confidence level# Standard error for proportionse = np.sqrt(p_hat * (1- p_hat) / n)# Critical z-score for 95% confidence levelz_crit = st.norm.ppf(1- alpha /2)# Method l - Manualci_manual = (p_hat - z_crit * se, p_hat + z_crit * se)# Method 2 - Using scipy's interval methodci_scipy = st.norm.interval(1- alpha, loc=p_hat, scale=se)# Method 3 - Using statsmodels' proportion_confint methodci_statsmodels =proportion_confint(x, n, alpha)print("Manual CI:", ci_manual)print("Scipy CI:", ci_scipy)print("Statsmodels CI:", ci_statsmodels)# Manual CI: (0.5039817664728938, 0.6960182335271061)# Scipy CI: (0.5039817664728938, 0.6960182335271061)# Statsmodels CI: (0.5039817664728937, 0.6960182335271062)