Multiple Treatments Uplift Models for Binary Outcome Using Python CausalML Multiple treatments ITE/CATE and ATE estimation using meta-learner uplift models S-learner, T-learner, and X-learner in Python for binary outcome data

Multiple Treatments Uplift Models for Binary Outcome Using Python CausalML

Multiple treatments sometimes are compared with a control group and with each other in an experiment. The experiment outcome can be continuous results such as sales or binary results such as a response to a promotion.

In this tutorial, we will talk about how to use the python package causalML to build meta-learner uplift models for an experiment with multiple treatments and binary outcomes. We will cover:

  • How to implement S-learner, T-learner, and X-learner on multiple treatments?
  • How to make an average treatment effect (ATE) estimation for multiple treatments?
  • How to make individual treatment effects (ITE) for multiple treatments?
  • How to get the confidence intervals for the average treatment effect (ATE) and individual treatment effect (ITE) estimation?

To learn how to build multiple treatments uplift models for continuous outcomes, please check out my previous tutorial Multiple Treatments Uplift Model Using Python Package CausalML.

Resources for this post:

Multiple Treatments Uplift Models for Binary Outcome – GrabNGoInfo.com

Let’s get started!

Step 1: Install and Import Libraries

In step 1, we will install and import the python libraries.

Firstly, let’s install causalml and the matplotlib version 3.4.2. matplotlib version 3.4.2 is installed because a matplotlib version later than 3.4.0 has a new functionality of adding labels to visualization.

# Install package
!pip install causalml

# Change matplotlib to a version later than 3.4.0 for the countplot visualization
!pip install matplotlib==3.4.2

After the installation and restarting the runtime, we can import the libraries.

  • pandas and numpy are imported for data processing.
  • synthetic_data is imported for synthetic data creation.
  • seaborn is imported for visualization.
  • BaseSClassifierBaseTClassifier, and BaseXClassifier are meta-learner uplift models.
  • XGBClassifierXGBRegressorLGBMClassifier, and LGBMRegressor are machine learning algorithms.
# Data processing
import pandas as pd
import numpy as np

# Create synthetic data
from causalml.dataset import synthetic_data

# Visualization
import seaborn as sns
import matplotlib

# Machine learning model
from causalml.inference.meta import BaseSClassifier, BaseTClassifier, BaseXClassifier
from xgboost import XGBClassifier, XGBRegressor
from lightgbm import LGBMClassifier, LGBMRegressor

Step 2: Create a Dataset

In step 2, we will create a synthetic dataset for the multiple treatments uplift model.

Firstly, a random seed is set to make the synthetic dataset reproducible.

Then, using the synthetic_data method from the causalml python package, we created a synthetic dataset.

  • mode is for the type of simulation for synthetic dataset creation. It is based on the paper by Nie X. and Wager S. (2018) titled “Quasi-Oracle Estimation of Heterogeneous Treatment Effects”.
    • 1 is for difficult nuisance components and an easy treatment effect.
    • 2 is for a randomized trial.
    • 3 is for an easy propensity and a difficult baseline.
    • 4 is for unrelated treatment and control groups.
    • 5 is for a hidden confounder biasing treatment.
  • n takes in the number of observations. 5000 observations are created in this example.
  • p takes in the number of covariates. We created 5 covariates for this dataset.
  • sigma=1 means the standard deviation of the error term is 1.
  • adj is the adjustment term for the distribution of propensity. High values shift the distribution to 0.

The synthetic_data method produces six outputs.

  • y is the outcome.
  • X is a matrix with all the covariates.
  • w is the treatment flag with two values. 0 represents the control group and 1 represents the treatment group. In this tutorial, w is renamed to treatment.
  • tau is the individual treatment effect (ITE). In this tutorial, tau is renamed to ite.
  • b is the expected outcome.
  • e is the propensity score for receiving treatment.
# Set a seed for reproducibility
np.random.seed(1)

# Generate synthetic data using mode 1
y, X, treatment, ite, b, e = synthetic_data(mode=1, n=5000, p=5, sigma=1, adj=0)

# Visualization of outcome
sns.displot(y)
Distribution plot of the continuous outcome — GrabNGoInfo.com

From the distribution plot of the outcome y, we can see that the outcome is a continuous variable ranging from around -2 to around 6.

A new binary outcome variable y_binary is created based on the value of the continuous variable. Using a list comprehension, we created a binary variable y_binary and set y_binary to 1 if y is greater than or equal to 2, otherwise, we set y_binary to 0.

The list comprehension result is in a list format, we change it to a numpy array using np.array to make the format compatible with the model requirement.

We can see that y_binary has 3327 zeros and 1673 ones in the binary outcome.

# Create a binary depdendent variable
y_binary = [1 if i >=2 else 0 for i in y]

# Conver the binary outcome to numpy array format
y_binary = np.array(y_binary)

# Distribution of the binary outcome
ax1 = sns.countplot(y_binary)
# Add labels
ax1.bar_label(ax1.containers[0]) 
Count plot of the binary outcome — GrabNGoInfo.com

The Python causalml package creates one control group and one treatment group by default. We used the random function from numpy to split the treatment group into two treatment groups, treatment_1 and treatment_2. we can see that out of 5000 samples, 2421 units did not receive treatments, 1235 received treatment 1, and 1344 received treatment 2.

# Create multiple treatments
treatment = np.array([('treatment_1' if np.random.random() > 0.5 else 'treatment_2') 
                      if t==1 else 'control' for t in treatment])

# Distribution of multiple treatments
ax2 = sns.countplot(treatment)
# Add labels
ax2.bar_label(ax2.containers[0]) 
Count plot of the treatments — GrabNGoInfo.com

Step 3: Multiple Treatments Uplift Model Using S Learner

In step 3, we will estimate the average treatment effect (ATE) and the individual treatment effect (ITE) for multiple treatments using an S-learner.

To learn more about the average treatment effect (ATE) and the individual treatment effect (ITE), please check out my previous tutorial ATE vs CATE vs ATT vs ATC for Causal Inference

S-learner is a meta-learner that uses a single machine learning model to estimate the individual-level causal treatment effect.

To estimate the average treatment effect (ATE) for multiple treatments, we first initiate the BaseSClassifier because we would like to estimate the impact on the binary outcome y_binaryBaseSClassifier is a generalized method that can take in existing machine learning models from packages such as sklearn and xgboost, and run s-learners with those models.

Here we are using the XGBClassifier as the modeling algorithm, but it can be replaced by any other binary classification model algorithm. random_state ensures the model results reproducible, and control_name tells the meta-learner the name of the control group.

After initiating the model, we can get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate method.

Besides passing in the covariates, the treatment variable, and the outcome variable, we need to specify return_ci=True to get the confidence interval for the estimated average treatment effect (ATE).

# Use XGBClassifier with BaseSClassifier
s_learner = BaseSClassifier(XGBClassifier(random_state=42), control_name='control')

# Estimated ATE, upper bound, and lower bound
ate, lb, ub = s_learner.estimate_ate(X=X, treatment=treatment, y=y_binary, return_ci=True)

# Print out results
print('Average Treatment Effect for treatment 1: {:.2f} ({:.2f}, {:.2f})'.format(ate[0], lb[0], ub[0]))
print('Average Treatment Effect for treatment 2: {:.2f} ({:.2f}, {:.2f})'.format(ate[1], lb[1], ub[1]))

We can see from the outputs that

  • The estimated average treatment effect (ATE) for treatment 1 is 0.19. The lower bound is 0.16 and the upper bound is 0.22.
  • The estimated average treatment effect (ATE) for treatment 2 is 0.18. The lower bound is 0.15 and the upper bound is 0.21.
Average Treatment Effect for treatment 1: 0.19 (0.16, 0.22)
Average Treatment Effect for treatment 2: 0.18 (0.15, 0.21)

Next, let’s estimate the individual treatment effect (ITE) for multiple treatments. Individual treatment effect (ITE) is the difference between the predicted outcomes with and without treatment.

The method fit_predict produces the estimated individual treatment effect (ITE).

From the first five results, we can see that the individual treatment effect (ITE) is estimated for both treatments.

# ITE
s_learner_ite = s_learner.fit_predict(X, treatment, y_binary)

# Take a look at the data
np.matrix(s_learner_ite[:5])

The first column corresponds to the individual treatment effect (ITE) for treatment 1 and the second column corresponds to the individual treatment effect (ITE) for treatment 2.

matrix([[0.17931405, 0.17507999],
        [0.06662243, 0.06373518],
        [0.28360182, 0.22966126],
        [0.14473481, 0.17125717],
        [0.32753605, 0.33586949]])

We can print out the individual treatment effect (ITE) estimation for each record. For example, the second record has the individual treatment effect (ITE) of 0.07 for treatment 1 and 0.06 for treatment 2.

# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the second record : {s_learner_ite[1][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the second record : {s_learner_ite[1][1]:.2f}')

Output:

The estimated ITE for treatment 1 for the second record : 0.07
The estimated ITE for treatment 2 for the second record : 0.06

If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True.

The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each treatment.

# ITE with confidence interval
s_learner_ite, s_learner_ite_lb, s_learner_ite_ub = s_learner.fit_predict(X=X, treatment=treatment, y=y_binary, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

# Take a look at the data
print('\nThe first five estimated ITEs are:\n', np.matrix(s_learner_ite[:5]))
print('\nThe first five estimated ITE lower bound are:\n', np.matrix(s_learner_ite_lb[:5]))
print('\nThe first five estimated ITE upper bound are:\n', np.matrix(s_learner_ite_ub[:5]))

The first columns for the individual treatment effect (ITE) and its upper bound and lower bound correspond to treatment 1, and the second columns correspond to treatment 2.

The first five estimated ITEs are:
 [[0.17931405 0.17507999]
 [0.06662243 0.06373518]
 [0.28360182 0.22966126]
 [0.14473481 0.17125717]
 [0.32753605 0.33586949]]

The first five estimated ITE lower bound are:
 [[-0.02619194 -0.03959816]
 [-0.02358359 -0.03909558]
 [ 0.02823787 -0.0228849 ]
 [-0.00868713 -0.00587439]
 [ 0.04735262  0.02495569]]

The first five estimated ITE upper bound are:
 [[0.4428641  0.52476914]
 [0.24881755 0.21498576]
 [0.55632797 0.4637997 ]
 [0.3584205  0.43201357]
 [0.55633879 0.48513989]]

We can print out the individual treatment effect (ITE) estimation and its upper bound and lower bound for each record.

  • The second record has the individual treatment effect (ITE) of 0.07 for treatment 1. The lower bound is -0.02, and the upper bound is 0.25.
  • The second record has the individual treatment effect (ITE) of 0.06 for treatment 2. The lower bound is -0.04, and the upper bound is 0.21.
# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the second record : {s_learner_ite[1][0]:.2f}, the lower bound is {s_learner_ite_lb[1][0]:.2f}, and the upper bound is {s_learner_ite_ub[1][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the second record : {s_learner_ite[1][1]:.2f}, the lower bound is {s_learner_ite_lb[1][1]:.2f}, and the upper bound is {s_learner_ite_ub[1][1]:.2f}')

Output:

The estimated ITE for treatment 1 for the second record : 0.07, the lower bound is -0.02, and the upper bound is 0.25
The estimated ITE for treatment 2 for the second record : 0.06, the lower bound is -0.04, and the upper bound is 0.21

To learn more about S-learner, please check out my previous tutorials S Learner Uplift Model for Individual Treatment Effect and Customer Segmentation in Python and Explainable S-Learner Uplift Model Using Python Package CausalML.

Step 4: Multiple Treatments Uplift Model Using T Learner

In step 4, we will estimate the average treatment effect (ATE) and the individual treatment effect (ITE) for multiple treatments using a T-learner.

T-learner follows three steps to estimate the individual treatment effect (ITE) and the average treatment effect (ATE):

  • Firstly, two machine learning models are built, one using the treated units and the other using the untreated units.
  • Next, the predictions will be made using the two models separately for all the units, both treated and control.
  • Finally, the individual treatment effect (ITE) will be estimated by getting the difference between the predicted outcome from the treated model and the predicted outcome from the control model. The average treatment effect (ATE) is the average of the individual treatment effect (ITE).

The Python CausalML package implements those steps for T-learner and we can initiate the model using BaseTClassifier. The binary classification algorithm used is LGBMClassifier but we can use any other binary classification algorithm.

Similar to the S-learner, the T-learner gets the average treatment effect (ATE) estimation using estimate_ate and the individual treatment effect (ITE) using fit_predict.

# Use LGBMClassifier with BaseTClassifier
t_learner = BaseTClassifier(LGBMClassifier(random_state=42), control_name='control')

# Estimated ATE, upper bound, and lower bound
ate, lb, ub = t_learner.estimate_ate(X=X, treatment=treatment, y=y_binary)

# Print out results
print('Average Treatment Effect for treatment 1: {:.2f} ({:.2f}, {:.2f})'.format(ate[0], lb[0], ub[0]))
print('Average Treatment Effect for treatment 2: {:.2f} ({:.2f}, {:.2f})\n'.format(ate[1], lb[1], ub[1]))

# ITE
t_learner_ite = t_learner.fit_predict(X, treatment, y_binary)

# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the first record : {t_learner_ite[0][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the first record : {t_learner_ite[0][1]:.2f}\n')

# Take a look at the data
np.matrix(t_learner_ite[:5])

Output:

Average Treatment Effect for treatment 1: 0.19 (0.17, 0.20)
Average Treatment Effect for treatment 2: 0.18 (0.16, 0.20)

The estimated ITE for treatment 1 for the first record : 0.60
The estimated ITE for treatment 2 for the first record : 0.67

matrix([[ 0.60190229,  0.66783576],
        [ 0.02335003, -0.03188483],
        [ 0.45772256,  0.74478591],
        [ 0.12407308,  0.08293488],
        [ 0.70603586,  0.57463337]])

We can use bootstrap to get the individual treatment effect (ITE) with a confidence interval.

# ITE with confidence interval
t_learner_ite, t_learner_ite_lb, t_learner_ite_ub = t_learner.fit_predict(X=X, treatment=treatment, y=y_binary, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

# Take a look at the data
print('\nThe first five estimated ITEs are:\n', np.matrix(t_learner_ite[:5]))
print('\nThe first five estimated ITE lower bound are:\n', np.matrix(t_learner_ite_lb[:5]))
print('\nThe first five estimated ITE upper bound are:\n', np.matrix(t_learner_ite_ub[:5]))

# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the second record : {t_learner_ite[1][0]:.2f}, the lower bound is {t_learner_ite_lb[1][0]:.2f}, and the upper bound is {t_learner_ite_ub[1][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the second record : {t_learner_ite[1][1]:.2f}, the lower bound is {t_learner_ite_lb[1][1]:.2f}, and the upper bound is {t_learner_ite_ub[1][1]:.2f}')

Output:

The first five estimated ITEs are:
 [[ 0.60190229  0.66783576]
 [ 0.02335003 -0.03188483]
 [ 0.45772256  0.74478591]
 [ 0.12407308  0.08293488]
 [ 0.70603586  0.57463337]]

The first five estimated ITE lower bound are:
 [[-0.11527054 -0.19702837]
 [-0.45672542 -0.40597087]
 [-0.30931199 -0.21621977]
 [-0.26300386 -0.37326323]
 [-0.31019533 -0.27425344]]

The first five estimated ITE upper bound are:
 [[0.82908162 0.85780676]
 [0.50654459 0.41690369]
 [0.91921774 0.85778004]
 [0.51517946 0.64176043]
 [0.91136078 0.88225468]]

The estimated ITE for treatment 1 for the second record : 0.02, the lower bound is -0.46, and the upper bound is 0.51
The estimated ITE for treatment 2 for the second record : -0.03, the lower bound is -0.41, and the upper bound is 0.42

To learn more about T-learner, please check out about my previous tutorials T Learner Uplift Model for Individual Treatment Effect (ITE) in Python and Explainable T-learner Deep Learning Uplift Model Using Python Package CausalML.

Step 5: Multiple Treatments Uplift Model Using X Learner

In step 5, we will estimate the average treatment effect (ATE) and the individual treatment effect (ITE) for multiple treatments using an X-learner.

X-learner consists of three stages, and each stage has model(s) that estimates different dependent variables.

  • Stage one is the same as the T-learner. It builds two machine learning models to predict the outcomes, one using the treated units and the other using the control units.
  • After the stage one models are built, the imputed individual treatment effect (ITE) will be calculated.
    • For the samples that received treatment, the imputed individual treatment effect (ITE) is the actual outcome minus the counterfactual outcome predicted by the control model from stage one.
    • For the samples that did not receive treatment, the imputed individual treatment effect (ITE) is the counterfactual outcome predicted by the treated model from stage one minus the actual outcome.
  • Stage two builds two machine learning models to predict the imputed individual treatment effect (ITE), one model is trained using the treated units, and the other model is trained using the untreated units.
  • After the stage two models are completed, the imputed individual treatment effect (ITE) predictions will be calculated separately using the treated and the control model for all the samples.
  • Stage three builds a propensity model to predict the propensity of getting treatment. The predicted propensity score is used as the weight for the individual treatment effect (ITE) calculation. Finally, the individual treatment effect (ITE) will be estimated by getting the weighted average of the stage two model predictions, using the propensity scores as the weights. The average treatment effect (ATE) is the average of the individual treatment effect (ITE).

The Python CausalML package implements those steps for X-learner and we can initiate the model using BaseXClassifier. In this example, we set treatment_outcome_learner=XGBClassifiertreatment_effect_learner=XGBRegressorcontrol_outcome_learner=LGBMClassifier, and control_effect_learner=LGBMRegressor, but we can use other machine learning algorithms. One thing we need to pay attention to is that the outcome learners are all classifiers and the effect learners are all regressors. This is because the outcome learners have binary outcomes and the effect learners have continuous individual treatment effects (ITE).

Similar to the S-learner and the T-learner, X-learner gets the average treatment effect (ATE) estimation using estimate_ate and the individual treatment effect (ITE) using fit_predict.

# BaseXClassifier
x_learner = BaseXClassifier(treatment_outcome_learner=XGBClassifier(random_state=42), 
                            treatment_effect_learner=XGBRegressor(random_state=42),
                            control_outcome_learner=LGBMClassifier(random_state=42),
                            control_effect_learner=LGBMRegressor(random_state=42),
                            control_name='control')

# Estimated ATE, upper bound, and lower bound
ate, lb, ub = x_learner.estimate_ate(X=X, treatment=treatment, y=y_binary)

# Print out results
print('Average Treatment Effect for treatment 1: {:.2f} ({:.2f}, {:.2f})'.format(ate[0], lb[0], ub[0]))
print('Average Treatment Effect for treatment 2: {:.2f} ({:.2f}, {:.2f})\n'.format(ate[1], lb[1], ub[1]))

# ITE
x_learner_ite = x_learner.fit_predict(X, treatment, y_binary)

# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the first record : {x_learner_ite[0][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the first record : {x_learner_ite[0][1]:.2f}\n')

# Take a look at the data
np.matrix(x_learner_ite[:5])

Output:

Average Treatment Effect for treatment 1: 0.20 (0.17, 0.22)
Average Treatment Effect for treatment 2: 0.19 (0.17, 0.22)

The estimated ITE for treatment 1 for the first record : 0.29
The estimated ITE for treatment 2 for the first record : 0.23

matrix([[ 0.29147581,  0.22698645],
        [ 0.08403824, -0.01377584],
        [ 0.29201931,  0.35909415],
        [ 0.19126503,  0.19721132],
        [ 0.33535843,  0.29034276]])

We can use bootstrap to get the individual treatment effect (ITE) with a confidence interval for X-learners too.

# ITE with confidence interval
x_learner_ite, x_learner_ite_lb, x_learner_ite_ub = x_learner.fit_predict(X=X, treatment=treatment, y=y_binary, return_ci=True,
                               n_bootstraps=100, bootstrap_size=500)

# Take a look at the data
print('\nThe first five estimated ITEs are:\n', np.matrix(x_learner_ite[:5]))
print('\nThe first five estimated ITE lower bound are:\n', np.matrix(x_learner_ite_lb[:5]))
print('\nThe first five estimated ITE upper bound are:\n', np.matrix(x_learner_ite_ub[:5]))

# Print out estimation for one record
print(f'The estimated ITE for treatment 1 for the second record : {x_learner_ite[1][0]:.2f}, the lower bound is {x_learner_ite_lb[1][0]:.2f}, and the upper bound is {x_learner_ite_ub[1][0]:.2f}')
print(f'The estimated ITE for treatment 2 for the second record : {x_learner_ite[1][1]:.2f}, the lower bound is {x_learner_ite_lb[1][1]:.2f}, and the upper bound is {x_learner_ite_ub[1][1]:.2f}')

Output:

The first five estimated ITEs are:
 [[ 0.29147581  0.22698645]
 [ 0.08403824 -0.01377584]
 [ 0.29201931  0.35909415]
 [ 0.19126503  0.19721132]
 [ 0.33535843  0.29034276]]

The first five estimated ITE lower bound are:
 [[-0.38115086 -0.21521366]
 [-0.21789358 -0.40537888]
 [-0.3249582  -0.24306238]
 [-0.27162852 -0.20118905]
 [-0.1990451  -0.20208934]]

The first five estimated ITE upper bound are:
 [[0.68786803 0.64827807]
 [0.5483517  0.52525036]
 [0.76913861 0.76159033]
 [0.48357337 0.55489094]
 [0.82066404 0.77705988]]

The estimated ITE for treatment 1 for the second record : 0.08, the lower bound is -0.22, and the upper bound is 0.55
The estimated ITE for treatment 2 for the second record : -0.01, the lower bound is -0.41, and the upper bound is 0.53

To learn more about X-learner, please check out my previous tutorial X-Learner Uplift Model in Python.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

References

Leave a Comment

Your email address will not be published. Required fields are marked *