Random Forests

Ensemble algorithms

Random forests is an ensemble algorithm.
They combine multiple ML algorithms to create a more powerful one.
In competitions, ensemble algorithms are usually the winners.
Two most common ensemble algorithms are random forests and gradient boosted decision trees.

A random forest is a group of decision trees that slightly differ from one another.
The idea is to build trees for different parts of the data. Each tree overfits a certain region. Averaging such trees can make a very powerful model that does not overfit.
The different trees differ from one another in a random way.
The randomness can be achieved by two means. a) using a random combination of samples with replacement (bootstrapping), b) randomly choosing a subset of features for each split.
If number of random features equals number of features, then there will be no randomness in feature selection.
If number of random features is very small, then the trees should be very deep to fit the data.
For prediction, all trees make a prediction. In case of regression, the result will be their average. In case of classification, a majority vote determines the category.

Tags: