What this book covers
Chapter 1, An Overview of Machine Learning, Classification, and Regression, provides a foundation for the rest of the book by introducing machine learning concepts. It covers how ensemble classification and regression tree models can be enhanced through bagging and boosting and gives an introduction to data preparation and data engineering.
Chapter 2, XGBoost Quick Start Guide with an Iris Data Case Study, uses the classic Iris dataset to walk through a practical example of how to use XGBoost in Python to build a classification model. At the end of the chapter, you will have code you can repurpose for similar classification problems.
Chapter 3, Demystifying the XGBoost Paper, provides a general overview of the XGBoost algorithm and how it works. Through examples with small datasets, you’ll learn about the features and benefits of XGBoost.
Chapter 4, Adding on to the Quick Start – Switching out the Dataset with a Housing Data Case Study, builds on the example in Chapter 2 and provides hands-on experience with XGBoost to make a prediction model. The intention of this chapter in combination with Chapter 2 is to give you an understanding of what code is dataset-specific when using XGBoost and what is independent of the dataset.
Chapter 5, Classification and Regression Trees, Ensembles, and Deep Learning Models – What’s Best for Your Data?, compares multiple algorithms and looks at performance and accuracy measurements to test and compare XGBoost to linear regression, scikit-learn gradient boosting, and random forest models. It gives a detailed explanation of the XGBoost hyperparameters and how you can change them to meet the needs of the data you are modeling.
Chapter 6, Data Cleaning, Imbalanced Data, and Other Data Problems, addresses common problems with real-life datasets. It covers data exploration and cleaning in depth and provides practical code examples for multiple use cases.
Chapter 7, Feature Engineering, explores feature engineering using a Kaggle Housing Prices dataset. You will learn common feature engineering techniques for numerical, temporal, and categorical data, applying them to the dataset.
Chapter 8, Encoding Techniques for Categorical Features, addresses the challenge of converting text data to numerical formats that can be used by machine learning models. This chapter provides practical experience with various encoding techniques.
Chapter 9, Using XGBoost for Time Series Forecasting, provides an opportunity to apply the data cleaning methods and techniques from Chapter 6 and the feature selection from Chapter 7 to time-series data. You’ll gain practical experience in building an XGBoost model to forecast data and evaluate the prediction.
Chapter 10, Model Interpretability, Explainability, and Feature Importance with XGBoost, explores model interpretability and explainability and gives hands-on experience with extracting feature importance. It is necessary, for transparency and trust, to be able to explain how XGBoost determines its results. This chapter demonstrates five methods for model interpretation.
Chapter 11, Metrics for Model Evaluations and Comparisons, provides hands-on experience with measuring model performance and adjusting hyperparameters, building on the discussion in Chapter 5.
Chapter 12, Managing a Feature Engineering Pipeline in Training and Inference, expands on the concepts and code examples in Chapters 7 and 9 to perform feature engineering for time-series data and use a pipeline to combine feature generation with model training.
Chapter 13, Deploying Your XGBoost Model, covers how to deploy your XGBoost model into a production environment. It discusses how to leverage the multithreaded and distributed compute features of XGBoost, as well as how to package your model into a container for cloud deployment. It discusses model maintenance through REST API calls, providing examples in Python.