Integer encoding
Integer encoding replaces categories with unique integers from 0 to n-1, where n is the number of distinct categories. The benefit of this method is that it does not expand the feature space and is computationally efficient. However, it does not capture any inherent relationships between categories. Let’s see how this works when implemented in code:
- For this method, you will use pandas, as well as
train_test_split
andLabelEncoder
from scikit-learn, so you can start by importing those packages. You will be able to compare the results from integer encoding with pandas and scikit-learn’sLabelEncoder
. You can start by importing the packages you will need:import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder
- Now, you can set up some sample data to use, which will be lists of values for
Neighborhood
andSalePrice
placed into a pandas DataFrame:data = { &apos...