Probability ratio encoding
Probability ratio encoding transforms categorical variables into numerical values by calculating the ratio of the probability of the target being 1 to the probability of it being 0 for each category. This encoding technique captures the distinct contribution of each category to the target outcome, providing valuable information for building effective models. Let’s get started:
- To begin, import
pandas
andtrain_test_split
from scikit-learn to prepare for applying the encoding. Importmatplotlib.pyplot
, which you will use to make graphs showing the relationship between the encoded categorical variables and the target variable:import pandas as pd from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt
- Next, set up the sample data. This sample dataset has two categorical columns,
Neighborhood
andExterior1st
, and a target column,SalePrice
:# Sample data data = { 'Neighborhood': ...