Top 5 Decision Tree Interview Questions for Data Science and Machine Learning Entropy, Information Gain (IG), Information Gain Ratio (IGR), Gini Impurity, pros and cons of a decision tree, and overfitting correction

Top 5 Decision Tree Interview Questions for Data Science and Machine Learning


The decision tree model is a frequently asked interview topic for data scientists and machine learning engineers. In this tutorial, we will talk about the top 5 decision tree interview questions and how to answer them. The 5 questions are:

  1. What are entropy, Information Gain (IG), and Information Gain Ratio (IGR) in a decision tree model?
  2. What is Gini Impurity in a decision tree model?
  3. What are the advantages of a decision tree model?
  4. What are the disadvantages of a decision tree model?
  5. How to prevent a decision tree from overfitting?

Resources for this post:

Top 5 Decision Tree Interview Questions – GrabNGoInfo.com

Let’s get started!


Question 1: What are entropy, Information Gain (IG), and Information Gain Ratio (IGR) in a decision tree model?

In a decision tree, all the possible splits by each feature are evaluated to decide the best split value. Entropy, Information Gain (IG), and Information Gain Ratio (IGR) are all related to the split evaluation.

Entropy measures the data impurity in a decision tree data split. It ranges from 0 to 1 for a binary decision tree, and can exceed 1 if there are more than two classes. An entropy of 0 indicates that all the data points in a branch are from the same class. A higher number indicates higher uncertainty or disorder in the branch.

Information Gain (IG) is the amount of information a feature provides. It is calculated as the difference between the parent node entropy and the weighted average entropy of the children nodes. The weights are the proportion of data points in the children’s nodes. A higher information gain value indicates a higher entropy reduction and a better split.

One downside of Information Gain (IG) is that it tends to favor the predictor with a large number of values and split the data into lots of subsets with low Entropy values. Information Gain Ratio (IGR) is the solution for this undesired property.

Information Gain Ratio (IGR) is a ratio between information gain and intrinsic information. Intrinsic information is the entropy of the child nodes proportions. Information Gain Ratio (IGR) reduces the bias toward multi-valued attributes by taking the number and size of branches into consideration during the feature evaluation process.

Question 2: What is Gini Impurity in a decision tree model?

The Gini impurity is also called the Gini index or Gini coefficient. It ranges from 0 to 0.5 for a binary classification model.

The Gini impurity of a split is the weighted average Gini impurity of the children nodes. The feature and split with the lowest Gini impurity are chosen.

Gini impurity is computationally more efficient than entropy because it does not need to calculate the logarithm of the probability for each class.

Question 3: What are the advantages of a decision tree model?

  • Easy interpretation. We can tell from the tree structure what features are used for the prediction and how the prediction is calculated.
  • No assumptions about the dataset. Works with any data distribution.
  • Minimal data preprocessing. No need to standardize the data because the decision tree is a non-parametric model. No need to do any feature engineering because the split is based on the raw values of the features.
  • Robust to outliers because it’s not a parametric model.
  • Missing values are automatically handled.
  • Apply to both regression and classification models. In a regression decision tree, each leaf represents a numeric value instead of a label. The numeric value is the average value of all the data points in a leaf node.

Question 4: What are the disadvantages of a decision tree model?

  • Tends to overfit and has high variance. This makes the model very sensitive to new data.
  • There is no quantified impact of features because it is a non-parametric model, so we do not know how much a feature impacts the final prediction.
  • It takes a long time to calculate entropy or Gini impurity for all possible splits when the dataset is large and there are numerical features for the decision tree model.

Question 5: How to prevent a decision tree from overfitting?

  • Set the minimum number of observations for a split.
  • Ensemble methods such as a random forest model.
  • Require a higher value for information gain in order to split.
  • Decrease the depth of the tree.
  • Increase the minimum number of data points in the leaf node.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

Recommended Tutorials

Leave a Comment

Your email address will not be published. Required fields are marked *