When to use ensemble models versus single CART models
XGBoost is an ensemble model, where multiple CART decision trees are created and the results are aggregated. You learned about CART models in Chapter 3. In this chapter, you will work on predictions using the housing dataset, which is the same data that you used in Chapter 4. You may want to start from your code in Chapter 4 and modify it as you experiment with the various types of models.
By combining the results from multiple trees to generate the output, ensemble models have smaller residuals and improved R2 and RMSE values than simple tree models. The risk of using ensemble models is overfitting. Models that have overfitting work well on the training dataset but do not provide as good a fit on other data because they are too specialized. This is a problem because your model will be inflexible and predictions made on data that isn’t exactly like your training data will be less accurate. What this means in practice is...