One-to-one Matching on Confounders Using Python Package Causal Inference. Bias-adjusted one-to-one and one-to-many matching on Confounders in python

One-to-one Matching on Confounders Using Python Package Causal Inference

One-to-one matching on confounders takes a sample in the treatment group, and finds a similar sample in the non-treatment group based on the confounder similarities. The goal of matching is to create a synthetic control group that is comparable to the treatment group.

In this tutorial, we will talk about how to do one-to-one matching on confounders using the Python CausalInference package. You will learn:

  • How to do one-to-one matching on confounders?
  • How to adjust the bias in one-to-one confounder matching?
  • How to expand one-to-one matching to one-to-many matching?

Resources for this post:

One-to-one Matching on Confounders in Python – GrabNGoInfo.com

Let’s get started!

Step 1: Install and Import Libraries

In step 1, we will install and import libraries.

Firstly, let’s install dowhy for dataset creation and causalinference for confounders matching.

# Install dowhy
!pip install dowhy

# Install causal inference
!pip install causalinference

After the installation is completed, we can import the libraries.

  • The datasets is imported from dowhy for dataset creation.
  • pandas and numpy are imported for data processing.
  • CausalModel is imported from the causalinference package for confounders matching.
# Package to create synthetic data for causal inference
from dowhy import datasets

# Data processing
import pandas as pd
import numpy as np

# Causal inference
from causalinference import CausalModel

Step 2:Create Dataset

In step 2, we will create a synthetic dataset for the causal inference.

  • Firstly, we set a random seed using np.random.seed to make the dataset reproducible.
  • Then a dataset with the true causal impact of 10, four confounders, 10,000 samples, a binary treatment variable, and a continuous outcome variable is created.
  • After that, we created a dataframe for the data. In the dataframe, the columns W0, W1, W2, and W3 are the four confounders, v0 is the treatment indicator, and y is the outcome.
# Set random seed
np.random.seed(42)

# Create a synthetic dataset
data = datasets.linear_dataset(
    beta=10,
    num_common_causes=4,
    num_samples=10000,
    treatment_is_binary=True,
    outcome_is_binary=False)

# Create Dataframe
df = data['df']

# Take a look at the data
df.head()
Causal Inference Data — GrabNGoInfo.com

Next, let’s rename v0 to treatment, rename y to outcome, and convert the boolean values to 0 and 1.

# Rename columns
df = df.rename({'v0': 'treatment', 'y': 'outcome'}, axis=1)

# Create the treatment variable, and change boolean values to 1 and 0
df['treatment'] = df['treatment'].apply(lambda x: 1 if x == True else 0)

# Take a look at the data
df.head()
Causal Inference Data — GrabNGoInfo.com

Step 3: Raw Difference

In step 3, we will initiate CausalModel and print the raw data summary statistics. CausalModel takes three arguments:

  • Y is the observed outcome.
  • D is the treatment indicator.
  • X is the covariates matrix.

CausalModel takes arrays as inputs, so .values are used when reading the data.

# Run causal model
causal = CausalModel(Y = df['outcome'].values, D = df['treatment'].values, X = df[['W0', 'W1', 'W2', 'W3']].values)

# Print summary statistics
print(causal.summary_stats)
Python CausalInference raw balance and difference — GrabNGoInfo.com

causal.summary_stats prints out the raw summary statistics. The output shows that:

  • There are 2,269 units in the control group and 7,731 units in the treatment group.
  • The average outcome for the treatment group is 13.94, and the average outcome for the control group is -2.191. So the raw difference between the treatment and the control group is 16.132, which is much higher than the actual treatment effect of 10.
  • Nor-diff is the standardized mean difference (SMD) for covariates between the treatment group and the control group. Standardized Mean Differences(SMD) greater than 0.1 means that the data is imbalanced between the treatment and the control group. We can see that most of the covariates have SMD greater than 0.1.

Step 4: Basic One-to-one Matching on Confounders

In step 4, we will implement the basic matching estimator on confounders.

Confounders matching usually involve the following steps:

  • Step 1: Compute the distance between the units in the treatment group and the control group using the confounders. The units with similar covariates produce smaller distances.
  • Step 2: Match subjects in the treatment and control groups using the shortest distance. There are different measures of distance. The python package CausalInference uses the inverse variance matrix as the weighting matrix to standardize the covariates, but this weighting matrix can be changed as a hyperparameter.
  • Step 3: Estimate the unit-level treatment effect for each matched pair.
  • Step 4: Calculate the overall treatment effect by averaging the unit-level treatment effects.

We do not need to manually implement the steps. In python CausalInferencecausal.est_via_matching does the one-to-one matching on confounders automatically, and causal.estimates gives us the estimation results.

# Matching on confounders
causal.est_via_matching()

# Print out the treatment effect estimation results
print(causal.estimates)
Basic One-to-one Matching on Confounders – GrabNGoInfo.com

From the treatment effect estimation results, we can see that the average treatment effect (ATE), the average treatment effect on the control (ATC), and the average treatment effect on the treated (ATE) are all around 12, which is a much more accurate estimation than the raw difference of 16.

To learn more about the average treatment effect (ATE) and how to calculate it, please check out my previous tutorial ATE vs CATE vs ATT vs ATC for Causal Inference.

Step 5: Bias-adjusted One-to-one Matching on Confounders

Is it possible to get an estimation that is even closer to the true causal impact of 10? In step 5, we will adjust the bias in the confounder matching for a more accurate treatment effect estimation.

Bias exists because the treatment unit and the control unit in the matched pairs do not have exactly the same covariates values. This matching discrepancy caused the bias in unit-level treatment effect estimation.

To adjust the bias, an ordinary least squares (OLS) estimation is used to account for the differences in the confounders locally on the matched units. To learn more about the ordinary least squares (OLS) treatment effect estimation, check out my previous tutorial OLS Treatment Effects Estimation Using Python Package Causal Inference.

To use the bias-adjusted one-to-one matching on confounders, we can change the hyperparameter value bias_adj to True.

# Matching on confounders
causal.est_via_matching(bias_adj=True)

# Print out the treatment effect estimation results
print('bias-adjusted:', causal.estimates)
Bias-adjusted One-to-one Matching on Confounders – GrabNGoInfo.com

From the treatment effect estimation results, we can see that the average treatment effect (ATE), the average treatment effect on the control (ATC), and the average treatment effect on the treated (ATE) are all around the true causal impact of 10, which is more accurate than the base version of the confounder matching.

Step 6: One-to-many Matching on Confounders

In step 6, we will talk about how to use One-to-many matching on confounders.

To invoke multiple matching in the python package CausalInference, we can set the hyperparameter value for matches to an integer that is greater than 1.

# Matching on confounders
causal.est_via_matching(bias_adj=True, matches=2)

# Print out the treatment effect estimation results
print('bias-adjusted one-to-many:', causal.estimates)
One-to-many Matching on Confounders – GrabNGoInfo.com

From the treatment effect estimation results, we can see that after applying the one-to-two matching, the average treatment effect (ATE), the average treatment effect on the control (ATC), and the average treatment effect on the treated (ATE) are all around the true causal impact of 10, which is similar to the results of the one-to-one matching. The documentation of python package CausalInference suggests keeping the matches value below 4 although it’s not a hard rule.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

References

1 thought on “One-to-one Matching on Confounders Using Python Package Causal Inference”

Leave a Comment

Your email address will not be published. Required fields are marked *