Working with the XGBoost API
There are two ways to use XGBoost with Python: the native API and the scikit-learn API. The primary difference between these two methods is that the native API requires you to convert your data into a DMatrix. A DMatrix is a specialized data structure used by XGBoost to optimize both memory usage and computation speed during model training. It stores data in a format that allows XGBoost to efficiently perform tasks such as sparse matrix optimization, enabling faster training on large datasets. The DMatrix format is particularly useful for handling missing values, as it can store sparse data without wasting memory on missing entries. You’ll need to convert your datasets into this format when using XGBoost’s native API for greater control over training parameters and performance.
The scikit-learn API works with pandas DataFrames and NumPy arrays. So far, in previous chapters, you’ve used the scikit-learn API. However, it abstracts...