What is random forest? Why is it good?
What is random forest? Why is it good?
Random forest? (Intuition):
- Underlying principle: several weak learners combined provide a strong learner
- Builds several decision trees on bootstrapped training samples of data
- On each tree, each time a split is considered, a random sample of mm predictors is chosen as split candidates, out of all pp predictors
- Rule of thumb: at each split m=p-√m=p
- Predictions: at the majority rule
Why is it good?
- Very good performance (decorrelates the features)
- Can model non-linear class boundaries
- Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
- Generates variable importance