Ordered integer encoding
Ordered integer encoding replaces categories with integers according to the order of a target variable. This is often the mean of the target variable within each category. If we encode the Neighborhood
column using this method, we will replace the neighborhood name with the average of the SalePrice
column for that neighborhood. The ordered integer encoding method ensures a monotonic relationship between the encoded variable and the target, beneficial for linear models. Let’s apply it using pandas:
- First, import
pandas
, andtrain_test_split
from scikit-learn to prepare for applying the encoding. Importmatplotlib.pyplot
, which you will use to make graphs to show the relationship between the encoded categorical variables and the target variable:import pandas as pd from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt
- Next, set up the sample data and split it into train and test datasets using
train_test_split
....