Causal inference analysis is frequently asked during data science and machine learning interviews. This tutorial will discuss the top 10 causal inference interview questions and how to answer them.

**Resources for this post:**

- Video tutorial for this post on YouTube
- Click here for the Colab notebook
- More video tutorials on Data Science Interview Questions and Causal Inference
- More blog posts on Data Science Interview Questions and Causal Inference

Let’s get started!

### Question 1: What is a Directed Acyclic Graph (DAG)?

- A directed acyclic graph (DAG) is a graph commonly used for modeling connectivity and causality. It is a directed graph of nodes without directed circles.
- In a Directed Acyclic Graph (DAG), nodes represent variables, and edges represent causal relationships between those variables. The direction of the edges indicates the direction of causality, and the absence of cycles indicates that there are no feedback loops in the causal structure.
- In causal inference, a Directed Acyclic Graph (DAG) represents a set of variables and their causal relationships. It visualizes causal structures and helps to identify how variables are causally related to each other.

### Question 2: What are confounders?

Confounder is also called confounding variables. They are variables related to both the treatment and outcome variables in a causal relationship. These variables can distort the true causal relationship between the treatment and outcome variables.

For example, we want to study the effect of a new medication on patient outcomes. Older patients are more likely to receive the medication and also more likely to have negative outcomes, therefore, age is a confounding variable. Without controlling the confounder, we may falsely conclude that the medication is causing the negative outcomes, when in fact the age of the patients is the true cause.

### Question 3: What is counterfactual?

Counterfactual means something that did not happen but could have happened.

For example, in the flu treatment dataset, Joe received treatment from a doctor and recovered in 10 days. We do not know the counterfactual outcome of Joe not getting the treatment because it did not happen.

### Question 4: What is Average Treatment Effect (ATE)?

- The average treatment effect (ATE) is the expected treatment impact across everyone in the population.
- We can get the Individual Treatment Effect (ITE) for everyone in the population first, then calculate the Average Treatment Effect (ATE) by taking the average of all the individual treatment effects.
- The Individual Treatment Effect is calculated by taking the difference between the outcome with treatment and the Outcome without Treatment of an individual.

Please check out my previous tutorial ATE vs CATE vs ATT vs ATC for Causal Inference for detailed calculation.

### Question 5: What is one-to-one confounder matching?

One-to-one confounder matching is a method for matching participants based on their similarity using a set of confounding variables. The goal of one-to-one confounder matching is to match each participant in the treatment group with a participant in the control group with a similar confounder level.

The Mahalanobis Distance Matching (MDM) is usually used for confounder matching. The Mahalanobis Distance is similar to the Euclidean distance. The difference is that Mahalanobis Distance Matching (MDM) uses standardized data while Euclidean distance uses the original data.

Here’s a general process for implementing one-to-one confounder matching:

- Identify the confounding variables for the study.
- Calculate the Mahalanobis distance between the samples in the treatment group and in the control group.
- Match subjects in the treatment and control group using the shortest Mahalanobis distance. Define a caliper as the maximum distance threshold we are willing to accept to avoid samples that are quite different being paired together.

👉 A small caliper means a small distance threshold, a better balance between the treatment and control groups, and a smaller number of matched pairs. The results are likely to have less bias and more variance.

👉 A large caliper means a large distance threshold, a worse balance between the treatment and control groups, and a larger number of matched pairs. The results are likely to have more bias and less variance.

4. Check the balance for the matched dataset and validate the similarity between the treatment and the control group.

5. Analyze the causal impact using standard statistical methods such as t-tests.

6. Conduct sensitivity analyses to examine the robustness of the results to the matching procedure. We can try removing outliers, varying the specification of the matching process, and comparing the results to other causal inference methods to make sure the analysis is robust.

One-to-one confounder matching accounts for correlations between the confounding variables, which can lead to more accurate matches. However, it can result in small sample sizes, which can reduce statistical power. It is best suited for continuous variables, and may not work well for categorical variables.

Please check out my previous tutorial to learn about how to implement one-to-one confounder matching using the R Matching package and the R MatchIt package.

### Question 6: What are the differences between one-to-one confounder matching and propensity score matching (PSM) for causal inference?

Both one-to-one confounder matching and propensity score matching (PSM) are methods used for reducing confounding bias in observational studies. However, there are some differences between these two methods:

- The matching approach is different. One-to-one confounder matching involves selecting one control group member for each treated individual based on similarity in a set of observed confounders. Propensity score matching (PSM), on the other hand, involves estimating the propensity score, which is the probability of receiving the treatment given the observed covariates, and then matching treated and control individuals based on their propensity score.
- The number of covariates allowed is different. In one-to-one confounder matching, only a limited set of observed confounders are used for matching. In contrast, propensity score matching (PSM) can use a larger set of observed confounders to estimate the propensity score.
- The matching precision is different: One-to-one confounder matching is a more precise matching method because it matches each treated individual to a unique control individual based on similarity in observed covariates. Propensity score matching (PSM), on the other hand, may result in less precise matching because individuals are matched based on their propensity score. The same propensity score may be generated by quite different covariates.

### Question 7: What is Inverse Probability Treatment Weighting (IPTW) in causal inference?

Inverse Probability Treatment Weighting (IPTW) is a method used in causal inference to estimate the causal effect of a treatment on an outcome in observational studies. Here is a general outline of the process:

- Define the research question and identify the treatment and outcome variables of interest.
- Identify potential confounding variables, which are variables that may be associated with both the treatment and outcome, and may distort the estimate of the treatment effect.
- Estimate the propensity score, which is the conditional probability of receiving the treatment given the observed covariates. This can be done using a logistic regression model where the treatment status is the dependent variable and the confounding variables are the independent variables.
- Calculate the Inverse Probability Treatment Weight (IPTW) for each observation, which is the inverse of the propensity score for the treated group and the inverse of one minus the propensity score for the control group.
- Apply the Inverse Probability Treatment Weight (IPTW) calculated to the modeling dataset. In the weighted sample, the distribution of covariates is the same between the treatment and control groups. Therefore, the confounding effect is removed.
- Calculate the treatment effect. Outcomes obtained with the Inverse Probability Treatment Weight (IPTW) can be compared directly between the treatment and the control group.
- Conducting the sensitivity analyses. We can try removing outliers, varying the specification of the propensity score model, and comparing the results to other causal inference methods to make sure the analysis is robust.

Please check out my previous tutorial Inverse Probability Treatment Weighting (IPTW) Using Python Package Causal Inference for implementation code in Python.

### Question 8: What is difference-in-difference for causal inference?

Difference-in-differences (DiD) is a causal inference method that compares changes in outcomes over time between a treatment group and a control group.

The difference-in-difference method is based on a few assumptions:

- Parallel trends: the trend in the outcome variable for the treatment group is the same as that of the control group.
- Common shocks: there are no other factors that affect the outcome variable differently for the treatment and control groups.
- Stable treatment effects: the treatment effect does not change over time.
- Intervention independence: allocation of the treatment intervention is not determined by the outcome

Mathematically, the causal impact can be expressed as:

𝐷𝑖𝐷=(𝑌𝑡𝑟𝑒𝑎𝑡𝑒𝑑,𝑎𝑓𝑡𝑒𝑟−𝑌𝑡𝑟𝑒𝑎𝑡𝑒𝑑,𝑏𝑒𝑓𝑜𝑟𝑒)−(𝑌𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑎𝑓𝑡𝑒𝑟−𝑌𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑏𝑒𝑓𝑜𝑟𝑒)

or

𝐷𝑖𝐷=(𝑌𝑡𝑟𝑒𝑎𝑡𝑒𝑑,𝑎𝑓𝑡𝑒𝑟−𝑌𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑎𝑓𝑡𝑒𝑟)−(𝑌𝑡𝑟𝑒𝑎𝑡𝑒𝑑,𝑏𝑒𝑓𝑜𝑟𝑒−𝑌𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑏𝑒𝑓𝑜𝑟𝑒)

where Y is the outcome variable, treated and control are the treatment and control groups, and before and after refer to the time periods before and after the treatment.

### Question 9: Causal Inference Assumptions?

Causal inference is the process of drawing causal conclusions from observational data. To make accurate causal inferences, certain assumptions need to be made. These assumptions include:

**Exchangeability**: The treatment and control groups are exchangeable, meaning that the distribution of confounders is balanced between the two groups.**Positivity**: The treatment is feasible for all units in the population, meaning that there are no factors that prevent any unit from being assigned to either the treatment or control group. This assumption requires that the probability of receiving the treatment is greater than zero for all units in the population.**Consistency**: The causal effect on an outcome is consistent across all samples. This assumption requires that the potential outcomes are well-defined and that the causal impact on the outcome is the same for all samples with the same set of covariates.**Ignorability**: Also called unconfoundedness, meaning that all confounders are identified and controlled for in the analysis. The treatment assignment is independent of the outcomes. If there is unmeasured confounders, the estimated causal effect can be biased.**Stable Unit Treatment Value Assumption (SUTVA)**: There is no interference or variation in the treatment.

👉 No interference means that the treatment effect of any sample is not influenced by other samples. This assumption can be violated when there is a network effect.

👉 No variation means that the treatment for all samples is comparable. For example, if a patient took a higher dose of medicine than suggested use, then it is a violation of the no variation assumption.

### Question 10: How to use an instrumental variable for causal inference?

Instrumental variable (IV) analysis is a method for causal inference that uses an instrumental variable to estimate the causal effect of a treatment on an outcome variable. The IV analysis assumes that the instrumental variable satisfies three conditions:

- Relevance: The instrumental variable is correlated with the treatment assignment.
- Exclusion: The instrumental variable has no direct effect on the outcome variable, except through the treatment.
- Independence: The instrumental variable is independent of any unobserved confounding factors that affect the outcome variable.

Two-Stage-Least-Square (2SLS) is usually used for instrumental variable causal inference. The steps for conducting Two-Stage-Least-Square (2SLS) are:

- Choose an instrumental variable that satisfies the three conditions mentioned above.
- The first stage estimates the effect of the instrumental variable on the treatment assignment using a regression model.
- The second stage estimates the treatment effect using the correlation between the outcome and the adjusted treatment effect from the first stage.

It is important to note that it is very hard to find a good instrumental variable that satisfies all the assumptions. So it is not a preferred method for causal inference compared with other methods.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

### Recommended Tutorials

- GrabNGoInfo Machine Learning Tutorials Inventory
- ATE vs CATE vs ATT vs ATC for Causal Inference
- Causal Inference One-to-one Propensity Score Matching Using R MatchIt Package
- Causal Inference One-to-one Matching on Confounders Using R
- Inverse Probability Treatment Weighting (IPTW) Using Python Package Causal Inference
- Top 7 Support Vector Machine (SVM) Interview Questions for Data Science and Machine Learning
- Top 5 Decision Tree Interview Questions for Data Science and Machine Learning
- Bagging vs Boosting vs Stacking in Machine Learning
- Top 10 NLP Concepts Interview Questions and Answers
- Top 10 Deep Learning Concept Interview Questions and Answers