Bagging vs Boosting vs Stacking in Machine Learning Data Science Interview Questions and Answers

Bagging vs Boosting vs Stacking in Machine Learning

In data science interviews, ensemble machine learning methods such as bagging, boosting, and stacking are commonly asked questions. An ensemble method is a way of combining the results from many machine learning models to achieve better prediction results. In this tutorial, we will talk about:

  • How do bagging, boosting, and stacking work in a machine learning model?
  • What are the examples of bagging, boosting, and stacking models?
  • What are the pros and cons of bagging, boosting, and stacking models?
  • What are the similarities and differences between bagging, boosting, and stacking models?

Resources for this post:

Bagging vs Boosting vs Stacking in Machine Learning –

Let’s get started!


Bagging stands for Bootstrap Aggregating, and it randomly samples the training dataset using bootstrap, builds one model for each random sample, and aggregates the predictions from each model to get the final model predictions. Each individual model is called a weak learner or a base model. In this tutorial, we will use the term weak learner and base model exchangeably.


Bootstrap randomly draws samples with replacements from the training dataset. It assumes that the samples are independent and identically distributed (i.i.d.).

  • Independence means that each draw of the sample is an independent event. Drawing one sample does not affect the probability of drawing another sample.
  • Identically distributed means that all the samples are from the same distribution


Random forest is an example of a bagging model. It builds a bunch of decision tree models using a random subset of samples and features from the training dataset and aggregates the results from the decision trees as the final predictions.

  • For classification models, the prediction with majority votes is selected.
  • For regression models, the average prediction from all the models is calculated and used as the final prediction value.


  • A bagging model reduces variance and can be used to correct overfitting.
  • Parallel training is time efficient
  • Robust to missing data because each decision tree uses a subset of records and features, so only a portion of the models are affected by the missing data.


  • Large memory and CPU are used when training parallelly.


Boosting trains multiple models using the same algorithm sequentially, each model focus on the errors made in the previous model by putting more weight on the wrong predictions.


Gradient Boosted Trees is an example of a boosting model. It builds many shallow decision trees. Each tree focuses on the error from the previous step and the tree results are used to update the current model.


  • A boosting model can usually reduce prediction bias because it focuses on correcting the wrong predictions.
  • Build one base model each time, so lower memory and CPU are needed compared with bagging.


  • Sequential training takes a long time to train the model


Stacking trains multiple models using different algorithms independently. The predictions from the individual models are used as the features/predictors of a meta-model. The prediction from the meta-model is used as the final prediction.


Stacking works for both the classification models and the regression models. We use a regression model as an example, but the classification model follows the same process.

  • Firstly, we will generate the features for the meta-model using the base models. The features are the predictions produced by different model algorithms, these models can be a random forest model, an XGBoost model, and a neural network model.
  • Then the predicted values from these models are used as the inputs for the meta-learner model. We can use a Ridge model as the meta-learner algorithm. The predicted results from the meta-learner are the final prediction.


  • Diverse algorithms with different assumptions can be used as the base model for stacking, so the advantages of different algorithms are incorporated.
  • Stacking predictions usually have higher accuracy, commonly used in competitions.


  • It takes a longer time to train the models and make predictions, so it may not be a good choice for a production model that needs fast prediction.

Comparison: Bagging vs. Boosting vs. Stacking

In the comparison section, we will compare the bagging, boosting, and stacking methods.

  • During the base-model/week learner training process, a bagging method can train all the individual base models in parallel, a boosting method trains each base model sequentially because a later model is dependent on the results of the previous models, and the base models for the stacking method are independent of one another, so they can be trained in parallel too.
  • The base-model type is homogeneous for the bagging and boosting methods, and heterogeneous for the stacking method. This is because bagging and boosting use the same algorithm for all the base models, but stacking usually uses different algorithms as base models.
  • The number of base models is high for the bagging and boosting methods, and low for the stacking. For example, random forest and gradient boosted trees usually have hundreds or thousands of trees, but a stacking method typically has less than 20 base models.
  • The tree depth for a bagging method is usually deep and the tree depth for a boosting method is usually shallow. Deep trees have low bias and high variance, so a bagging method uses deep trees and focuses on reducing variance. Shallow trees have low variance and high bias, so a boosting method uses shallow trees and focuses on reducing bias. The stacking method can use any algorithm, so the depth of trees does not apply to the stacking method.
  • In terms of how the base-model results are aggregated, the bagging method assigns equal weight to each base-model, the boosting method assigns a higher weight to the base models with better performance, and the stacking method gives higher weight to base-model predictions that contribute more to the meta-learner.
  • The main goal for the bagging method is to reduce variance, the main goal for the boosting method is to reduce bias, and the stacking method does not have a constant goal because it depends on what machine learning algorithms are used as the base models.
  • The memory and CPU use for a bagging model can be pretty high because of the parallel processing of a large number of weak learners. The memory and CPU use for a boosting model is usually low because it processes one model at a time. The memory and CPU used for the stacking method are dependent on the base-model algorithm.
Comparison of bagging, boosting, and stacking —

For more information about data science and machine learning, please check out my YouTube channel and Medium Page or follow me on LinkedIn.Medium Page.

Recommended Tutorials


Leave a Comment

Your email address will not be published. Required fields are marked *