Comparing XGBoost to linear regression
When building models, you want to start with the simplest and increase the complexity of the model only if needed. This approach means your models will be fast, small, and easy to explain, and only take more compute resources and training time if the simple options don’t work. Taking this same approach for these examples, start by comparing XGBoost to a plain linear fit. Recall from Chapter 4 that your XGBoost model had an RMSE of 0.487 and an R2 of 0.819. You’re looking to match these values or better. RMSE measures the amount of error between the predicted values and the known true values. Therefore, s better RMSE is lower, meaning there is less error. A better R2 is higher, meaning the model fits the data better. Let’s get started:
- Perform a linear fit on the data: The easy way to perform linear regression is to use scikit-learn, which contains a linear regression model. Note that you don’t need to train...