Power Analysis For Sample Size Using Python

Power Analysis For Sample Size Using Python

Power analysis is a statistical analysis based on significance level, effect size, statistical power, and sample size. We can use it to calculate any one of the four values given the other three. Power analysis and sample size calculation are common interview questions for data analyst, data scientist, and machine learning engineer positions.

In this tutorial, we will cover

  • What are significance level, effect size, statistical power, and sample size?
  • How to calculate the sample size for hypothesis testing (AB testing) using Python?
  • How to calculate the statistical power?
  • How to calculate the minimum detectable effect size?
  • How to calculate the significance level?
  • How do sample size, significance level, and effect size impact the statistical power?

Resources for this post:

  • More video tutorials on Statistics
  • More blog posts on Statistics
  • Click here for the Colab notebook
  • If you prefer the video version of the tutorial, watch the video below on YouTube
Power Analysis for Sample Size Calculation in Python – GrabNGoInfo.com

Let’s get started!

Step 0: What are significance level, effect size, statistical power, and sample size

Power analysis is commonly used for sample size calculations in hypothesis testing. This step will talk about the four components of power analysis: significance level, effect size, statistical power, and sample size.

  • The significance level is also called alpha. It is the probability of rejecting the null hypothesis when it is true. Alpha tells us about the false positive rate or type I error. A commonly used alpha value is 0.05.
  • Effect size is the magnitude of the difference between the null hypothesis and the alternative hypothesis.
    • Other factors being the same, we need a larger sample size to detect a smaller effect size.
    • The more significant the effect size is, the less likely the difference between treatment and control is due to random error. Conversely, when there is more variability in the dataset, the difference is more likely due to random error.
    • Cohen’s d measures the distance between the mean of treatment and control group while considering the standard deviation. Usually, the Cohen’s d value of 0.2 is considered to be a small effect size, 0.5 a medium effect size, and 0.8 a large effect size.
    • The effect size needs to be meaningful from a business perspective.
  • Statistical power is the probability of correctly rejecting the null hypothesis. Or we can say it’s the probability of detecting an effect if it exists. Power equals one minus beta, where beta is the false negative rate or type II error. 0.8 is a commonly used value for power.
  • The sample size is the minimum sample size needed per group.

Step 1: Import Libraries

In the first step, we will import libraries for the power analysis.

numpy is for data calculation. TTestIndPower is imported from statsmodels for power analysis of a two-sample independent t-test. matplotlib is for visualization.

# Data calculation
import numpy as np

# Power analysis
from statsmodels.stats.power import TTestIndPower

# Visualization
import matplotlib.pyplot as plt

Step 2: Calculate Sample Size

In step 2, we will calculate the sample size needed for a two-sample t-test. To calculate the sample size, we need to provide effect size, alpha, and power values.

Effect size refers to the standardized effect size. The difference between the treatment and control group means is divided by the pooled standard deviation. The effect_size value needs to be a positive number. We use the value 0.2 for the effect size, which is a small effect size value.

alpha is the significance level, and power is the statistical power. We use the commonly used value of 0.05 for alpha and 0.8 for power.

We are doing a two-sided hypothesis testing, so alternative = 'two-sided'.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Calculate sample size
sample_size = power_analysis.solve_power(effect_size = 0.2, alpha = 0.05, power = 0.8, alternative = 'two-sided')

# Print results
print('The sample size needed for each group is', round(sample_size))

The power analysis gave us a sample size of 393. That means we need at least 393 samples in each AB testing group to detect a slight difference with 80% power and a 5% significance level.

The sample size needed for each group is 393

Step 3: Calculate Power

In step 3, we will calculate the power using effect size, significance level, and sample size.

We use the sample size 393, which was calculated from the previous step, for the power calculation.

ratio = 1 means the number of samples in the control and treatment groups are the same.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Calculate power
power = power_analysis.power(effect_size = 0.2, alpha = 0.05, nobs1 = 393, ratio = 1, alternative = 'two-sided')

# Print results
print('The power for the hypothesis testing is', round(power, 2))

We get the power of 0.8, which is consistent with the previous results in the sample size calculation.

The power for the hypothesis testing is 0.8

Step 4: Calculate Effect Size

There are different ways of calculating effect size. Cohen’s d is one of the most widely used methods for effect size calculation. In step 4, we will calculate effect size using Cohen’s d’s equation.

Cohen’s d’s equation needs the mean, standard deviation, and sample sizes for the treatment and control group. It first calculates the pooled standard deviation and then uses the mean difference divided by the pooled standard deviation.

# Input parameters
mu1 = 2.1 # Group 1 average value
mu2 = 1.9 # Group 2 average value
s1 = 0.6 # Group 1 standard deviation
s2 = 0.5 # Group 2 standard deviation
n1 = 400 # Group 1 sample size
n2 = 400 # Group 2 sample size

# Calculate the pooled standard deviation
s = np.sqrt(((n1 - 1) * s1 + (n2 - 1) * s2) / (n1 + n2 - 2))

# Calculate the Cohen's d effect size
d = (mu1 - mu2) / s

# Print results
print('The effect size for the hypothesis testing is', round(d, 2))

We get the effect size of 0.27, which is a relatively small Cohen’s d value.

The effect size for the hypothesis testing is 0.27

Step 5: Calculate Significance Level

In step 5, we will calculate the significance level given the other three values.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Calculate power
alpha = power_analysis.solve_power(effect_size = 0.2, power = 0.8, nobs1 = 393, ratio = 1, alternative = 'two-sided')

# Print results
print('The significance level for the hypothesis testing is', round(alpha, 2))

We get a significance level of 0.05, which is consistent with the previous results.

The significance level for the hypothesis testing is 0.05

Step 6: Sample Size Vs Statistical Power

In step 6, we will examine how statistical power changes with sample size changes.

dep_var specifies the measure for the x-axis. dep_var='nobs' means that we are using sample size as x-axis and power as the y-axis.

nobs is the value of sample size. We used the value of 5 to 800 to plot the graph.

effect_size is the effect size value for the power analysis. We used the widely used Cohen’s d value for small, medium, and large effect sizes. Each effect_size has one curve on the graph.

alpha is the significance level. We used 0.05 for this plot.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Visualization
power_analysis.plot_power(dep_var='nobs',
                          nobs=np.arange(5, 800),
                          effect_size=np.array([0.2, 0.5, 0.8]),
                          alpha=0.05,
                          title='Sample Size vs. Statistical Power')
plt.show()
Sample Size vs. Statistical Power - GrabNGoInfo
Sample Size vs. Statistical Power – GrabNGoInfo

From the graph, we can see that

  • Statistical power increases with the sample size.
  • To achieve the same statistical power, we need more samples to detect a smaller effect size.

Step 7: Effect Size Vs Statistical Power

In step 7, we will examine how statistical power changes with effect size.

The 'effect_size' is used as x-axis. We plot the curves for three sample sizes, 50, 100, and 500.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Visualization
power_analysis.plot_power(dep_var='effect_size',
                          nobs=np.array([50, 100, 500]),
                          effect_size=np.array([0.2, 0.5, 0.8]),
                          alpha=0.05,
                          title='Effect Size vs. Statistical Power')
plt.show()
Effect Size vs. Statistical Power - GrabNGoInfo
Effect Size vs. Statistical Power – GrabNGoInfo

From the graph, we can see that

  • Statistical power increases with effect size.
  • When the effect size is small, a larger sample size is needed to achieve a good power value.
  • As effect size increases, the impact of sample size on power is smaller.

Step 8: Significance Level Vs Statistical Power

In step 8, we will examine how statistical power changes with the significance level.

'alpha' is used as the x-axis. We plot the curves for the alpha range from 0.01 to 0.30. The effect_size is fixed at 0.5.

# Initiate the power analysis
power_analysis = TTestIndPower()

# Visualization
power_analysis.plot_power(dep_var='alpha',
                          nobs=np.array([50, 100, 500]),
                          effect_size=0.5,
                          alpha=np.arange(0.01,0.30, 0.01),
                          title='Significance Level vs. Statistical Power')
plt.show()
Statistical Level vs. Statistical Power - GrabNGoInfo
Statistical Level vs. Statistical Power – GrabNGoInfo

From the graph, we can see that

  • Statistical power increases with significance level.
  • When alpha is small, a larger sample size is needed to achieve a good power value.
  • As alpha increases, the impact of sample size on power is smaller.

Summary

In this tutorial, we talked about how to use statistical power analysis for hypothesis testing. The method can be used to calculate the minimum sample size needed for AB testing. We covered

  • What are significance level, effect size, statistical power, and sample size?
  • How to calculate the sample size for hypothesis testing (AB testing) using Python?
  • How to calculate the statistical power?
  • How to calculate the minimum detectable effect size?
  • How to calculate the significance level?
  • How do sample size, significance level, and effect size impact the statistical power?

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials


References

statsmodels documentation on power analysis

Leave a Comment

Your email address will not be published. Required fields are marked *