Mean encoding
Mean encoding transforms categorical variables into numerical values by replacing each category with the mean of the target variable for that category. This technique establishes a direct relationship between the encoded variable and the target, making it particularly useful in linear models. Let’s implement it:
- First, import
pandas
andtrain_test_split
from scikit-learn to prepare for applying the encoding. Importmatplotlib.pyplot
, which you will use to make graphs showing the relationship between the encoded categorical variables and the target variable:import pandas as pd from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt
- Next, set up the sample data and split it into train and test datasets using
train_test_split
. This sample dataset has two categorical columns,Neighborhood
andExterior1st
, and a target column,SalePrice
:# Sample data data = { 'Neighborhood': ['OldTown&apos...