Ensemble Method — Random Forest

3 min readJul 6, 2021

Random Forest is very similar to bagging trees (recommend reading my blog about bagging trees) but there is an added feature of randomness in the random forest that makes the difference from bagged trees.

Random Forest: Recipe

To construct a random forest estimator, what we need is:

1- Bootstrap the entire dataset

2- Build a tree using only a random subset of the features at each node from the bootstrapped dataset. (New Step)

3- Repeat steps 1 and 2 many, many times, and aggregate all the trees.

4- Output prediction through each tree

5- For regression, take the average of prediction. For classification, take the majority predicted value.

Random Forests de-correlate each of the decision trees created in bagging by ensuring that at each split, only (m) features are considered for a given split. (Typically, m = sqrt(p))

This means that on average (p-m)/p splits will not even consider a given strong predictor. As we increase the number of trees, this will not lead to overfitting, meaning that we should make as many trees as possible until we have achieved an acceptable error rate.

Random Forest Classifier with different options of max features

Random Subspace Sampling Method

Important note: the features are randomly chosen at each node, not for the entire decision tree. At each node a different set of features might be considered so your tree is not built on (m) features, it is still built on (p) features.

This is just to ensure that our trees need to be decorrelated with one another and that they have diverse opinions.

In the following image we can see the comparing results of random forest, bagging, and Decision Trees:

Random Forest Advantages/Disadvantages

Advantages

- A very powerful model. Will nearly always outperform Decision Trees.

- Able to detect non-linear relationships well

- Harder than other models to overfit

Disadvantages

- Not as interpretable as Decision Trees

-Many hyperparameters to tune (GridSearch is your friend in this regard!)

Random Forest Hyperparameters

N_estimators: the number of trees in the forest. You can plot an error graph to see where your error values start to plateau and you would choose a point right before it starts to plateau because why have more trees when we don’t have a marginal decrease or increase in our error values.

Criterion: “Gini”, “entropy”

Max_features: the number of random features to be considered when looking for the best split (by default is the square root of total features)

Max_depth: whether or not bootstrap samples are used to build trees.

OOB_score: whether or not to use out-of-bag samples to estimate the generalization accuracy

N_jobs: how many cores you want to use when training your trees

There are other hyperparameters that you can in the scikit-learn library

An interesting point about tree algorithms you can extract feature importances

Feature Importances

The more the accuracy of the random forest decreases due to the exclusion (or permutation) of a single variable, the more important that variable is, and therefore variables with a large mean decrease in accuracy are more important for the classification of the data.

There are many other ways to determine the feature importances of Random Forest. Check out here.