T-learner is a meta-learner that uses two machine learning models to estimate the individual-level heterogeneous causal treatment effect. In this tutorial, we will talk about how to use the python package `causalML`

to build a T-learner. We will cover:

- How to implement T-learner using the XGBoost model, the light GBM model, and the neural network model separately?
- How to make individual treatment effect (ITE) and average treatment effect (ATE) estimations using a T-learner?
- How to check the T-learner feature importance?
- How to interpret a T-learner uplift model using SHAP?

**Resources for this post:**

- Click here for the Colab notebook.
- More video tutorials on Uplift Modeling
- More blog posts on Uplift Modeling
- If you are not a Medium member and want to support me to keep providing free content (😄 Buy me a cup of coffee ☕), join Medium membership through this link. You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support 🙏
- Give me a tip to show your appreciation and help me keep providing free content. Thank you for your generosity 🙏
- Video tutorial for this post on YouTube

If you are interested in building a T-learner manually, please check out my previous tutorial T Learner Uplift Model for Individual Treatment Effect (ITE) in Python.

Let’s get started!

### Step 1: Install and Import Libraries

In step 1, we will install and import the python libraries.

Firstly, let’s install `causalml`

.

# Install package !pip install causalml

After the installation is completed, we can import the libraries.

`pandas`

and`numpy`

are imported for data processing.`synthetic_data`

is imported for synthetic data creation.`XGBTRegressor`

,`MLPTRegressor`

,`BaseTRegressor`

,`LGBMRegressor`

and`XGBRegressor`

are for the machine learning model training.

# Data processing import pandas as pd import numpy as np # Create synthetic data from causalml.dataset import synthetic_data # Machine learning model from causalml.inference.meta import XGBTRegressor, MLPTRegressor, BaseTRegressor from xgboost import XGBRegressor from lightgbm import LGBMRegressor

### Step 2: Create Dataset

In step 2, we will create a synthetic dataset for the T-learner uplift model.

Firstly, a random seed is set to make the synthetic dataset reproducible.

Then, using the `synthetic_data`

method from the `causalml`

python package, we created a dataset with five features, one treatment variable, and one continuous outcome variable.

# Set a seed for reproducibility np.random.seed(42) # Create a synthetic dataset y, X, treatment, ite, _, _ = synthetic_data(mode=1, n=5000, p=5, sigma=1.0) feature_names = ['X1', 'X2', 'X3', 'X4', 'X5']

After that, using `value_counts`

on the `treatment`

variable, we can see that out of 5000 samples, 2582 units received treatment and 2418 units did not receive treatment.

# Check treatment vs. control counts pd.Series(treatment).value_counts()

Output:

1 2582 0 2418 dtype: int64

Finally, we get the true average treatment effect (ATE) by taking the mean of the true individual treatment effect (ITE). The true average treatment effect (ATE) is about 0.5.

# True ATE ite.mean()

Output:

0.4988477022092744

### Step 3: T-learner Using XGBoost Model

In step 3, we will use the XGBoost model with T-learner to estimate the average treatment effect (ATE) and the individual treatment effect (ITE).

`XGBTRegressor`

is a built-in XGBoost T-learner model that comes with the `causalML`

package.

To estimate the average treatment effect (ATE) using `XGBTRegressor`

, we first initiate the `XGBTRegressor`

, then get the average treatment effect (ATE) and its upper bound and lower bound using the `estimate_ate`

method.

# Use XGBTRegressor xgb = XGBTRegressor(random_state=42) # Estimated ATE, upper bound, and lower bound te, lb, ub = xgb.estimate_ate(X, treatment, y) # Print out results print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.61, which is 0.11 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.56 and the upper bound for the average treatment effect (ATE) is 0.67.

Average Treatment Effect: 0.61 (0.56, 0.67)

The method `fit_predict`

produces the estimated individual treatment effect (ITE).

If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting `return_ci=True`

.

The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE xgb_ite = xgb.fit_predict(X, treatment, y) # ITE with confidence interval xgb_ite, xgb_ite_lb, xgb_ite_ub = xgb.fit_predict(X=X, treatment=treatment, y=y, return_ci=True, n_bootstraps=100, bootstrap_size=500)

### Step 4: T-learner Using Light GBM Model

In step 4, we will talk about how to use `BaseTRegressor`

with a light GBM model for the T-learner.

`BaseTRegressor`

is a generalized method that can take in existing machine learning models from packages such as `sklearn`

and `xgboost`

, and run T-learners with those models. In this step, we will run the `BaseTRegressor`

with `LGBMRegressor`

.

If we run `BaseTRegressor`

with `xgboost`

, the result is the same as the `XGBTRegressor`

that comes with the `causalml`

python package.

To estimate the average treatment effect (ATE) using `BaseTRegressor`

, we first initiate the `BaseTRegressor`

with the `LGBMRegressor`

, then get the average treatment effect (ATE) and its upper bound and lower bound using the `estimate_ate`

method.

# Use LGBMRegressor with BaseSRegressor lgbm = BaseTRegressor(LGBMRegressor(random_state=42)) # Estimated ATE, upper bound, and lower bound te, lb, ub = lgbm.estimate_ate(X, treatment, y) # Print out results print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.54 and the upper bound for the average treatment effect (ATE) is 0.62. The confidence interval range is smaller than the confidence interval from the built-in `XGBTRegressor`

.

Average Treatment Effect: 0.58 (0.54, 0.62)

The results show that using the `BaseSRegressor`

in combination with LGBMRegressor produced a better estimation for the average treatment effect (ATE) than the built-in `XGBTRegressor`

.

To estimate the individual treatment effect (ITE), we use the method `fit_predict`

on the light GBM model.

We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting `return_ci=True`

to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE lgbm_ite = lgbm.fit_predict(X, treatment, y) # ITE with confidence interval lgbm_ite, lgbm_ite_lb, lgbm_ite_ub = lgbm.fit_predict(X=X, treatment=treatment, y=y, return_ci=True, n_bootstraps=100, bootstrap_size=500)

### Step 5: T-learner Using Neural Network Model

In step 5, we will talk about how to use a neural network model with T-learner.

The python package `causalml`

has a built-in function `MLPTRegressor`

that runs the multilayer perceptron neural network models with T-learner.

`hidden_layer_sizes`

specifies the number of hidden layers and the number of neurons in each layer.`hidden_layer_sizes=(35, 25, 10, 5)`

means that there are four hidden layers for the neural network model. The first hidden layer has 35 neurons, the second hidden layer has 25 neurons, the third hidden layer has 10 neurons, and the fourth hidden layer has 5 neurons.`learning_rate_init`

specifies the initial learning rate of the neural network model. We set the initial value to 0.01.- early_stopping=True means the neural network model stops training if the model loss does not improve.
- random_state gives us reproducible results.

After initiating the neural network model using `MLPTRegressor`

, we gave it a name `nn`

and run `estimate_ate`

on it to get the average treatment effect (ATE), and the upper and lower bound of the average treatment effect (ATE).

# Use MLPTRegressor with BaseSRegressor nn = MLPTRegressor(hidden_layer_sizes=(35, 25, 10, 5), learning_rate_init=.01, early_stopping=True, random_state=1) # Estimated ATE, upper bound, and lower bound te, lb, ub = nn.estimate_ate(X, treatment, y) # Print out results print('Average Treatment Effect: {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE), and the same as the light GBM results.

Average Treatment Effect: 0.58 (0.53, 0.64)

Tuning the hyperparameters such as the number of layers, the number of neurons in each layer, and the initial learning rate can potentially improve the model performance.

Calculating the individual treatment effect (ITE) for the neural network model is the same as other T-learner models. We use the method `fit_predict`

on the neural network model to get the individual treatment effect (ITE).

We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.

# ITE nn_ite = nn.fit_predict(X, treatment, y) # ITE with confidence interval nn_ite, nn_ite_lb, nn_ite_ub = nn.fit_predict(X=X, treatment=treatment, y=y, return_ci=True, n_bootstraps=100, bootstrap_size=500)

### Step 6: T-learner Neural Network Model Feature Importance

In step 6, we will talk about how to get feature importance for T-learner.

The syntax for getting the feature importance is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.

The feature importance is calculated by building a new machine learning model on the backend, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.

`get_importance`

is the function to get the feature importance values.`X`

takes in the feature matrix.`tau`

takes in the individual treatment effect (ITE).`features=feature_names`

prints out the feature names in the outputs.`random_state`

makes the results reproducible.`method`

specifies whether to use`auto`

or`permutation`

for the feature importance calculation.`auto`

works on a tree-based estimator. It uses the estimator’s default feature importance. If no tree-based estimator is provided, it falls back to the`LGBMRegressor`

and`gain`

as the importance type.`permutation`

works on any estimator. It permutes a feature column and calculates the decrease in accuracy. The feature importance is ordered based on the magnitude of the decrease in accuracy. When the sample size is large, downsampling is suggested.

# Feature importance using permutation nn.get_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)

From the output, we can see that `X2`

is the most important feature, `X5`

is the least important feature.

{1: X2 0.975168 X1 0.812584 X3 0.215354 X4 0.097804 X5 0.055254 dtype: float64}

We can also visualize the feature importance using the `plot_importance`

function.

# Visualization nn.plot_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)

### Step 7: T-learner Neural Network Model Interpretation

In step 7, we will interpret the T-learner model using SHAP (SHapley Additive exPlanations).

The syntax for SHAP interpretation is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.

The sharpley values are calculated based on a machine learning model, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.

`plot_shap_values`

is the function to visualize SHAP values.`X`

takes in the feature matrix.`tau`

takes in the individual treatment effect (ITE).`features=feature_names`

prints out the feature names in the outputs.

# Plot shap values nn.plot_shap_values(X=X, tau=nn_ite, features=feature_names)

The SHAP plot includes both the feature importance and the feature impacts.

- The y-axis is the list of features ordered from the most important to the least important.
- The x-axis is the SHAP value, representing how each feature impacts the model output.
- The color of the dots represents the feature values. Blue indicates low values, and red indicates high values.
- The overlapping dots are jittered, which helps us to see the distribution of each feature.

For example, from the SHAP plot we can see that `X2`

is the most important feature. High `X2`

values affect the predictions in a positive direction and low `X2`

values affect the predictions in a negative direction. Most samples have high `X2`

values.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

### Recommended Tutorials

- GrabNGoInfo Machine Learning Tutorials Inventory
- S Learner Uplift Model for Individual Treatment Effect and Customer Segmentation in Python
- Time Series Causal Impact Analysis in Python
- Time Series Causal Impact Analysis in R
- Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python
- 3 Ways for Multiple Time Series Forecasting Using Prophet in Python
- Time Series Anomaly Detection Using Prophet in Python
- Four Oversampling And Under-Sampling Methods For Imbalanced Classification Using Python
- Hyperparameter Tuning For XGBoost
- Autoencoder For Anomaly Detection Using Tensorflow Keras
- Databricks Mount To AWS S3 And Import Data
- One-Class SVM For Anomaly Detection
- Sentiment Analysis Without Modeling: TextBlob vs. VADER vs. Flair
- Recommendation System: User-Based Collaborative Filtering
- How to detect outliers | Data Science Interview Questions and Answers
- Causal Inference One-to-one Matching on Confounders Using R for Python Users
- Gaussian Mixture Model (GMM) for Anomaly Detection
- Time Series Anomaly Detection Using Prophet in Python
- How to Use R with Google Colab Notebook

### References

- Künzel, Sören R., et al. “Metalearners for estimating heterogeneous treatment effects using machine learning.” Proceedings of the national academy of sciences 116.10 (2019): 4156-4165.
- CausalML documentation