Bagging vs Boosting, Bias vs Variance, Depth of trees



I understand the main principle of bagging and boosting for classification and regression trees. My doubts are about the optimization of the hyperparameters, especially the depth of the trees

First question: why we are supposed to use weak learners for boosting (high bias) whereas we have to use deep trees for bagging (high variance) ? - Honestly, I'm not sure about the second one, just heard it once and never seen any documentation about it.

Second question : why and how can it happen that we get better results in the grid searches for gradient boosting with deeper trees than weak learners (and similarly with weak learners than deeper trees in random forest)?


Posted 2019-10-15T13:19:59.797

Reputation: 153



why we are supposed to use weak learners for boosting (high bias) whereas we have to use deep trees for bagging (very high variance)

Clearly it wouldn't make sense to bag a bunch of shallow trees/weak learners. The average of many bad predictions will still be pretty bad. For many problems decision stumps (a tree with a single split node) will produce results close to random. Combining many random predictions will generally not produce good results.

On the other hand, the depth of the trees in boosting limits the interaction effects between features, e.g. if you have 3 levels, you can only approximate second-order effects. For many ("most") applications low-level interaction effects are the most important ones. Hastie et al. in ESL (pdf) suggest that trees with more than 6 levels rarely show improvements over shallower trees. Selecting trees deeper than necessary will only introduce unnecessary variance into the model!

That should also partly explain the second question. If there are strong and higher-order interaction-effects in the data, deeper trees can perform better. However, trees that are too deep will underperform by increasing variance without additional benefits.


Posted 2019-10-15T13:19:59.797

Reputation: 5 477


Question 1:

Bagging (Random Forest) is just an improvement on Decision Tree; Decision Tree has lot of nice properties, but it suffers from overfitting (high variance), by taking samples and constructing many trees we are reducing variance, with minimal effect on bias.

Boosting is a different approach, we start with a simple model that has low variance and high bias, and add new models sequentially to reduce bias. If we used deep trees, we would run high risk of overfitting.

Question 2:

Gradient Boosting with deeper trees will allow you to fit a very complex relationship; higher variance lower bias. This will reduce error due to bias.

Random Forest with shallow trees will have lower variance and higher bias, this will reduce error do to overfitting. It is possible that Random Forest with standard parameters is overfitting, so reducing depth of trees improves the performance.


Posted 2019-10-15T13:19:59.797

Reputation: 721