You're reading from XGBoost for Regression Predictive Modeling and Time Series Analysis Learn how to build, evaluate, and deploy predictive models with expert guidance

Product type Paperback

Published in Dec 2024

Publisher Packt

ISBN-13 9781805123057

Length 308 pages

Edition 1st Edition

Concepts

Data Governance

Authors (2):

Joyce Weiner

Partha Pritam Deka

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1:Introduction to Machine Learning and XGBoost with Case Studies

2. Chapter 1: An Overview of Machine Learning, Classification, and Regression FREE CHAPTER

3. Chapter 2: XGBoost Quick Start Guide with an Iris Data Case Study

4. Chapter 3: Demystifying the XGBoost Paper

5. Chapter 4: Adding on to the Quick Start – Switching out the Dataset with a Housing Data Case Study

6. Part 2: Practical Applications – Data, Features, and Hyperparameters

7. Chapter 5: Classification and Regression Trees, Ensembles, and Deep Learning Models – What’s Best for Your Data?

8. Chapter 6: Data Cleaning, Imbalanced Data, and Other Data Problems

9. Chapter 7: Feature Engineering

10. Chapter 8: Encoding Techniques for Categorical Features

11. Chapter 9: Using XGBoost for Time Series Forecasting

12. Chapter 10: Model Interpretability, Explainability, and Feature Importance with XGBoost

13. Part 3: Model Evaluation Metrics and Putting Your Model into Production

14. Chapter 11: Metrics for Model Evaluations and Comparisons

15. Chapter 12: Managing a Feature Engineering Pipeline in Training and Inference

16. Chapter 13: Deploying Your XGBoost Model

17. Index

Why subscribe?

18. Other Books You May Enjoy

Ensembled models: bagging versus boosting

Ensemble modeling is a machine learning technique that combines multiple models to create a more accurate and robust model. The individual models in an ensemble are called base models. The ensemble model learns from the base models and makes predictions by combining their predictions.

Bagging and boosting are two popular ensemble learning methods used in machine learning to create more accurate models by combining individual models. However, they differ in their approach and the way they combine models.

Bagging (bootstrap aggregation) creates multiple models by repeatedly sampling the original dataset with a replacement, which means some data points may be included in multiple models, while other data points may not be included in any models. Each model is trained on its subset, and the final prediction is obtained by averaging in the case of regression or voting the predictions of all individual models in the case of classification. Since it uses a resampling technique, bagging reduces the variance or the impact using a different training set will have on the model.

Boosting is an iterative technique that focuses on sequentially improving the models, with each model being trained to correct the mistakes of the previous models. To begin with, a base model is trained on the entire training dataset. The subsequent models are then trained by adjusting the weights to give more importance to the misclassified instances in the previous models. The final prediction is obtained by combining the predictions of all individual models using a weighted sum, where the weights are assigned based on the performance of each model. Boosting reduces the bias in the model. In this context, bias means the assumptions that are being made about the form of the model function. For example, if you use a linear model, you are assuming that the form of the equation that predicts the data is linear – the model is biased towards linear. As you might expect, decision tree models be less biased than linear regression or logistic regression models. Boosting iterates on the equation and further reduces the bias.

The following table summarizes the key differences between bagging and boosting:

Bagging	Boosting
Models are trained individually, independently and parallelly	Models are trained sequentially, with each model trying to correct the mistakes of the previous model
Each model has equal weight in the final prediction	Each model’s weight in the final prediction depends on its performance
Variance is reduced and overfitting removed	Bias is reduced but overfitting may occur
More accurate ensemble models are created, for example, Random Forest	More accurate ensemble models are created, for example, AdaBoost, Gradient Boosting, and XGBoost

Table 1.2 – Table summarizing the differences between bagging and boosting

The following diagram depicts the conceptual difference between bagging and boosting in a pictorial way:

Figure 1.2 – Bagging versus boosting

Next, let’s explore the two key steps in any machine learning process: data preparation and data engineering.

You're reading from XGBoost for Regression Predictive Modeling and Time Series Analysis Learn how to build, evaluate, and deploy predictive models with expert guidance

Table of Contents (19) Chapters

Ensembled models: bagging versus boosting

Authors (2)

Personalised recommendations for you