Gradient boosting
At this point, we can introduce a more general method of creating boosted ensembles. Let's choose a generic algorithm family, represented as follows:
Each model is parametrized using the vector θi and there are no restrictions on the kind of method that is employed. In this case, we are going to consider decision trees (which is one of the most diffused algorithms when this boosting strategy is employed—in this case, the algorithm is known as gradient tree boosting), but the theory is generic and can be easily applied to more complex models, such as neural networks. In a decision tree, the parameter vector θi is made up of selection tuples, so the reader can think of this method as a pseudo-random forest where, instead of randomness, we look for extra optimality exploiting the previous experience. In fact, as with AdaBoost, a gradient boosting ensemble is built sequentially, using a technique that is formally defined as Forward Stage-wise Additive Modeling. The resulting...