Ensemble Method — Bagging Trees

Ali Mahzoon
3 min readJul 6, 2021

Any method that combines more than one model is known as an Ensemble Method. In this post, we are going to talk about Bagged trees which is a kind of ensemble method.

For any bagged trees these are the steps that we are going to do to have our model running:

1- Bootstrap the entire dataset

2- Build a tree using the bootstrapped dataset

3- Repeat step 1 and 2 many times and aggregate all trees

4- Output prediction through each tree

5- For regression, take the average of the prediction. For classification, take the majority predicted value

What is Bootstrapping?

We know that trees are prone to overfitting because they have a high variance (highly recommend reading my blog about Decision Trees). This is especially true if the trees are built out to full “Purity” in each of the leaves. In other words, Bootstrapping is the process of switching up our data set such that we are accounting for different patterns within our data.

To help prevent overfitting we take bootstrap samples with replacement from our training data that is the same size as our training data.

Bootstrapping changes the distribution of data a little bit, but all of the data does come from our original dataset.

What is aggregating?

Bootstrapping followed by aggregating, once we have taken all the bootstrapped samples, we fit a decision tree to each sample. So, each tree will have a low bias and high variance and prone to overfitting.

We repeat this process for however many trees we want in our model.

Now we can feed data through all of the bootstrapped trees and take:

- Classification: whichever class is predicted most by the bootstrapped decision trees.

-Regression: take an average of the predicted values for each decision tree.

Bootstrapping + aggregating that’s where we get the term “Bagging” from.

As we can see in the following picture we have a much smoother, less overfit decision boundary when we have bagging (weak learners working together and we have an average prediction. In other words, the wisdom of a crowd)

In this process, Imagine you have built 100 trees and you are planning on doing cross-validation or doing a bunch of train test splits then testing your model can be pretty inefficient. So, there is a kind of way of doing cross-validation called the Out-of-Bag Error. Since performing cross-validation on bagged classifiers/regressors is challenging and can be computationally expensive, we prefer to use Out-of-Bag Error.

Out-Of-Bag Error (OOBE)

You can sort of imagining OOBE as a train-test split within your train-test split. In general, only 2/3 of the observations will be trained on each model. We can look at every observation and make a prediction for each of the data points for which that data points were not used to create the tree. (It’s essentially just a form of cross-validation but you are not retraining your model as many times.)

Bagging Problem

The issue with bagging is that each one of the trees might be correlated to the other. There might be a feature that is powerful in generating a separation between different categories, which results in trees that are correlated to one another despite being from bootstrapped samples.

--

--