The decision tree model is a frequently asked interview topic for data scientists and machine learning engineers. In this tutorial, we will talk about the top 5 decision tree interview questions and how to answer them. The 5 questions are:

- What are entropy, Information Gain (IG), and Information Gain Ratio (IGR) in a decision tree model?
- What is Gini Impurity in a decision tree model?
- What are the advantages of a decision tree model?
- What are the disadvantages of a decision tree model?
- How to prevent a decision tree from overfitting?

**Resources for this post:**

- Video tutorial for this post on YouTube
- More video tutorials on Data Science Interview Questions
- More blog posts on Data Science Interview Questions

Let’s get started!

### Question 1: What are entropy, Information Gain (IG), and Information Gain Ratio (IGR) in a decision tree model?

In a decision tree, all the possible splits by each feature are evaluated to decide the best split value. Entropy, Information Gain (IG), and Information Gain Ratio (IGR) are all related to the split evaluation.

**Entropy** measures the data impurity in a decision tree data split. It ranges from 0 to 1 for a binary decision tree, and can exceed 1 if there are more than two classes. An entropy of 0 indicates that all the data points in a branch are from the same class. A higher number indicates higher uncertainty or disorder in the branch.

**Information Gain (IG)** is the amount of information a feature provides. It is calculated as the difference between the parent node entropy and the weighted average entropy of the children nodes. The weights are the proportion of data points in the children’s nodes. A higher information gain value indicates a higher entropy reduction and a better split.

One downside of Information Gain (IG) is that it tends to favor the predictor with a large number of values and split the data into lots of subsets with low Entropy values. Information Gain Ratio (IGR) is the solution for this undesired property.

**Information Gain Ratio (IGR)** is a ratio between information gain and intrinsic information. Intrinsic information is the entropy of the child nodes proportions. Information Gain Ratio (IGR) reduces the bias toward multi-valued attributes by taking the number and size of branches into consideration during the feature evaluation process.

### Question 2: What is Gini Impurity in a decision tree model?

The Gini impurity is also called the Gini index or Gini coefficient. It ranges from 0 to 0.5 for a binary classification model.

The Gini impurity of a split is the weighted average Gini impurity of the children nodes. The feature and split with the lowest Gini impurity are chosen.

Gini impurity is computationally more efficient than entropy because it does not need to calculate the logarithm of the probability for each class.

### Question 3: What are the advantages of a decision tree model?

**Easy interpretation**. We can tell from the tree structure what features are used for the prediction and how the prediction is calculated.**No assumptions**about the dataset. Works with any data distribution.**Minimal data preprocessing**. No need to standardize the data because the decision tree is a non-parametric model. No need to do any feature engineering because the split is based on the raw values of the features.**Robust to outliers**because it’s not a parametric model.**Missing values**are automatically handled.- Apply to
**both regression and classification**models. In a regression decision tree, each leaf represents a numeric value instead of a label. The numeric value is the average value of all the data points in a leaf node.

### Question 4: What are the disadvantages of a decision tree model?

**Tends to overfit**and has high variance. This makes the model very sensitive to new data.- There is
**no quantified impact of features**because it is a non-parametric model, so we do not know how much a feature impacts the final prediction. - It takes a
**long time to calculate**entropy or Gini impurity for all possible splits when the dataset is large and there are numerical features for the decision tree model.

### Question 5: How to prevent a decision tree from overfitting?

- Set the minimum number of observations for a split.
- Ensemble methods such as a random forest model.
- Require a higher value for information gain in order to split.
- Decrease the depth of the tree.
- Increase the minimum number of data points in the leaf node.

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.

### Recommended Tutorials

- GrabNGoInfo Machine Learning Tutorials Inventory
- What is a p-value? | Data Science Interview Questions and Answers
- What is a t-test? Data Science Interview Questions and Answers
- How to detect outliers | Data Science Interview Questions and Answers
- Correlation vs Causation | Data Science Interview Questions and Answers
- Power Analysis For Sample Size Using Python
- How to evaluate the performance of a binary classification model? | Data Science Interview Questions and Answers
- Bagging vs Boosting vs Stacking in Machine Learning
- How to decide the number of clusters | Data Science Interview Questions and Answers
- Gradient Descent vs. Stochastic Gradient Descent vs. Batch Gradient Descent vs. Mini-batch Gradient Descent