OLS Treatment Effects Estimation Using Python Package Causal Inference. Estimate treatment effects using ordinary least squares (OLS) in Python

OLS Treatment Effects Estimation Using Python Package Causal Inference

CausalInference is a Python package for causal analysis. It has different functionalities such as propensity score trimming, covariates matching, ordinary least squares (OLS) treatment effects estimation, subclassification, and inverse probability weighting.

In this tutorial, we will talk about how to do ordinary least squares (OLS) treatment effects estimation. Other functionalities will be introduced in future tutorials.

Resources for this post:

OLS Treatment Effects Estimation Using Python Package Causal Inference – GrabNGoInfo.com

Let’s get started!

Step 1: Install and Import Libraries

In step 1, we will install and import libraries.

Firstly, let’s install dowhy for dataset creation and causalinference for ordinary least squares (OLS) treatment effects estimation.

# Install dowhy
!pip install dowhy

# Install causal inference
!pip install causalinference

After the installation is completed, we can import the libraries.

  • The datasets is imported from dowhy for dataset creation.
  • pandas and numpy are imported for data processing.
  • CausalModel is imported from the causalinference package for ordinary least squares (OLS) treatment effects estimation.
# Package to create synthetic data for causal inference
from dowhy import datasets

# Data processing
import pandas as pd
import numpy as np

# Causal inference
from causalinference import CausalModel

Step 2:Create Dataset

In step 2, we will create a synthetic dataset for the causal inference.

  • Firstly, we set a random seed using np.random.seed to make the dataset reproducible.
  • Then a dataset with the true causal impact of 10, four confounders, 10,000 samples, a binary treatment variable, and a continuous outcome variable is created.
  • After that, we created a dataframe for the data. In the dataframe, the columns W0, W1, W2, and W3 are the four confounders, v0 is the treatment indicator, and y is the outcome.
# Set random seed
np.random.seed(42)

# Create a synthetic dataset
data = datasets.linear_dataset(
    beta=10,
    num_common_causes=4,
    num_samples=10000,
    treatment_is_binary=True,
    outcome_is_binary=False)

# Create Dataframe
df = data['df']

# Take a look at the data
df.head()
Causal Inference Data — GrabNGoInfo.com

Next, let’s rename v0 to treatment, rename y to outcome, and convert the boolean values to 0 and 1.

# Rename columns
df = df.rename({'v0': 'treatment', 'y': 'outcome'}, axis=1)

# Create the treatment variable, and change boolean values to 1 and 0
df['treatment'] = df['treatment'].apply(lambda x: 1 if x == True else 0)

# Take a look at the data
df.head()
Causal Inference Data — GrabNGoInfo.com

Step 3: Raw Difference

In step 3, we will initiate CausalModel and print the raw data summary statistics. CausalModel takes three arguments:

  • Y is the observed outcome.
  • D is the treatment indicator.
  • X is the covariates matrix.

CausalModel takes arrays as inputs, so .values are used when reading the data.

# Run causal model
causal = CausalModel(Y = df['outcome'].values, D = df['treatment'].values, X = df[['W0', 'W1', 'W2', 'W3']].values)

# Print summary statistics
print(causal.summary_stats)
Python CausalInference raw balance and difference — GrabNGoInfo.com

causal.summary_stats prints out the raw summary statistics. The output shows that:

  • There are 2,269 units in the control group and 7,731 units in the treatment group.
  • The average outcome for the treatment group is 13.94, and the average outcome for the control group is -2.191. So the raw difference between the treatment and the control group is 16.132, which is much higher than the actual treatment effect of 10.
  • Nor-diff is the standardized mean difference (SMD) for covariates between the treatment group and the control group. Standardized Mean Differences(SMD) greater than 0.1 means that the data is imbalanced between the treatment and the control group. We can see that most of the covariates have SMD greater than 0.1.

Step 4: Treatment Effects Estimation Using Ordinary Least Squares (OLS)

In step 4, we will talk about how to measure the treatment effects using the ordinary least squares (OLS) model.

est_via_ols() runs the ordinary least squares (OLS) estimation. Changing the hyperparameter values changes the features in the model.

  • adj=0 only includes the treatment variable in the model, and it produces the raw differences between the treatment and the control group.
  • adj=1 includes the treatment variable and the covariates as the model predictors. In the actual model, the features are the de-meaned values (the covariate values minus the covariate mean).
  • adj=2 is the default value. It includes the treatment variable, the covariates, and the interactions between the treatment variable and the covariates as model predictors. Including the interaction terms imply that the treatment effect may differ across individuals [2].

causal.estimates prints out the treatment effects estimation results.

  • ATE is the average treatment effect
  • ATC is the average treatment effect on the control
  • ATT is the average treatment effect on the treated

To learn more about the average treatment effect (ATE), the average treatment effect on the control (ATC), the average treatment effect on the treated (ATE), and how the values are calculated, please check out my previous tutorial ATE vs CATE vs ATT vs ATC for Causal Inference.

# OLS treatment estimation adj=0
causal.est_via_ols(adj=0)
print('adj=0', causal.estimates)

# OLS treatment estimation adj=1
causal.est_via_ols(adj=1)
print('adj=1', causal.estimates)

# OLS treatment estimation adj=2
causal.est_via_ols(adj=2)
print('adj=2', causal.estimates)
Treatment Effects Estimation Using Ordinary Least Squares (OLS) – GrabNGoInfo.com

Comparing the ordinary least squares (OLS) treatment effects results, we can see that:

  • adj=0 gives us the average treatment effects (ATE) of 16.132, which is exactly the same as the raw differences we calculated in step 3.
  • adj=1 gives us the average treatment effects (ATE) of 10, which is the same as the true causal impact.
  • adj=2 produces the true causal impact value of 10 for the average treatment effects (ATE), the average treatment effect on the control (ATC), and the average treatment effect on the treated (ATE).

From the results, we can see that using the ordinary least squares (OLS) to estimate the treatment effects greatly improved the accuracy of causal effect estimation.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

References

[1] DoWhy Documentation

[2] CausalInference Documentation

Leave a Comment

Your email address will not be published. Required fields are marked *