Preface
Machine learning is an artificial intelligence (AI) technique that uses historical data to train a model to do either classification, putting items into groups, or prediction, estimating future values. XGBoost is a popular library for implementing machine learning with gradient-boosting algorithms. It is fast and performant, and XGBoost offers features that enable it to handle big data.
This book will give you a solid foundation for understanding machine learning and the XGBoost algorithm, and layers of practical techniques you can use when solving data science problems. We include examples that address both categorical and numeric data and classification and regression tasks and focus our attention on time-series data for the last third of the book.
Time-series data, used in forecasting for finance, supply chain management, and other industries, can pose unique challenges when training a model. With temporal data, the order of the data will impact the model results. Care must be taken to properly encode inputs to the model to handle things such as seasonal effects, or end-of-period (month, quarter, year) impacts. Although XGBoost is not designed specifically for sequential data, it can be adapted to be applied to forecasting-type problems.
Often, books and online resources only cover proof-of-concept type applications. Here, we will discuss full production deployment. We also address practical considerations such as how to monitor model performance, when to re-train a deployed model, and how to use pipelines for ease of model maintenance.