Preparing data for predictive modeling
In this section, you will prepare the data to ready it for modeling with XGBoost. To start preparing to make a model, you’ll split our data into testing and training sets. As in Chapter 2, you will use the train_test_split
function from Scikit-learn.
Follow these steps:
- Split the housing data into training and test DataFrames: You can reuse the code from Chapter 2. To do this, swap
housingX
andhousingy
foririsdata
. You can do bothX
andy
in one line, by providing four variable names to store the results. We’ve chosenX_train, X_test, y_train, y_test
. No other changes are needed from this step in Chapter 2. Now, you have 80% of the data in the training set, reserving 20% for testing the model.random_state=17
is again an arbitrary value to seed the random selection of which columns go into which set; by using the same value as we do here, you’ll have the same rows in your data, and our outputs will match:from...