Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
XGBoost for Regression Predictive Modeling and Time Series Analysis

You're reading from   XGBoost for Regression Predictive Modeling and Time Series Analysis Learn how to build, evaluate, and deploy predictive models with expert guidance

Arrow left icon
Product type Paperback
Published in Dec 2024
Publisher Packt
ISBN-13 9781805123057
Length 308 pages
Edition 1st Edition
Arrow right icon
Authors (2):
Arrow left icon
Joyce Weiner Joyce Weiner
Author Profile Icon Joyce Weiner
Joyce Weiner
Partha Pritam Deka Partha Pritam Deka
Author Profile Icon Partha Pritam Deka
Partha Pritam Deka
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Part 1:Introduction to Machine Learning and XGBoost with Case Studies
2. Chapter 1: An Overview of Machine Learning, Classification, and Regression FREE CHAPTER 3. Chapter 2: XGBoost Quick Start Guide with an Iris Data Case Study 4. Chapter 3: Demystifying the XGBoost Paper 5. Chapter 4: Adding on to the Quick Start – Switching out the Dataset with a Housing Data Case Study 6. Part 2: Practical Applications – Data, Features, and Hyperparameters
7. Chapter 5: Classification and Regression Trees, Ensembles, and Deep Learning Models – What’s Best for Your Data? 8. Chapter 6: Data Cleaning, Imbalanced Data, and Other Data Problems 9. Chapter 7: Feature Engineering 10. Chapter 8: Encoding Techniques for Categorical Features 11. Chapter 9: Using XGBoost for Time Series Forecasting 12. Chapter 10: Model Interpretability, Explainability, and Feature Importance with XGBoost 13. Part 3: Model Evaluation Metrics and Putting Your Model into Production
14. Chapter 11: Metrics for Model Evaluations and Comparisons 15. Chapter 12: Managing a Feature Engineering Pipeline in Training and Inference 16. Chapter 13: Deploying Your XGBoost Model 17. Index 18. Other Books You May Enjoy

Ensembled models: bagging versus boosting

Ensemble modeling is a machine learning technique that combines multiple models to create a more accurate and robust model. The individual models in an ensemble are called base models. The ensemble model learns from the base models and makes predictions by combining their predictions.

Bagging and boosting are two popular ensemble learning methods used in machine learning to create more accurate models by combining individual models. However, they differ in their approach and the way they combine models.

Bagging (bootstrap aggregation) creates multiple models by repeatedly sampling the original dataset with a replacement, which means some data points may be included in multiple models, while other data points may not be included in any models. Each model is trained on its subset, and the final prediction is obtained by averaging in the case of regression or voting the predictions of all individual models in the case of classification. Since it uses a resampling technique, bagging reduces the variance or the impact using a different training set will have on the model.

Boosting is an iterative technique that focuses on sequentially improving the models, with each model being trained to correct the mistakes of the previous models. To begin with, a base model is trained on the entire training dataset. The subsequent models are then trained by adjusting the weights to give more importance to the misclassified instances in the previous models. The final prediction is obtained by combining the predictions of all individual models using a weighted sum, where the weights are assigned based on the performance of each model. Boosting reduces the bias in the model. In this context, bias means the assumptions that are being made about the form of the model function. For example, if you use a linear model, you are assuming that the form of the equation that predicts the data is linear – the model is biased towards linear. As you might expect, decision tree models be less biased than linear regression or logistic regression models. Boosting iterates on the equation and further reduces the bias.

The following table summarizes the key differences between bagging and boosting:

Bagging

Boosting

Models are trained individually, independently and parallelly

Models are trained sequentially, with each model trying to correct the mistakes of the previous model

Each model has equal weight in the final prediction

Each model’s weight in the final prediction depends on its performance

Variance is reduced and overfitting removed

Bias is reduced but overfitting may occur

More accurate ensemble models are created, for example, Random Forest

More accurate ensemble models are created, for example, AdaBoost, Gradient Boosting, and XGBoost

Table 1.2 – Table summarizing the differences between bagging and boosting

The following diagram depicts the conceptual difference between bagging and boosting in a pictorial way:

Figure 1.2 – Bagging versus boosting

Figure 1.2 – Bagging versus boosting

Next, let’s explore the two key steps in any machine learning process: data preparation and data engineering.

You have been reading a chapter from
XGBoost for Regression Predictive Modeling and Time Series Analysis
Published in: Dec 2024
Publisher: Packt
ISBN-13: 9781805123057
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image