Chapter 4: Supervised Learning Algorithms: Predict Annual Income
Activity 11: Training a Naïve Bayes Model for our Census Income Dataset
Before working on step 1, make sure that the data has been preprocessed, as follows:
import pandas as pd data = pd.read_csv("datasets/census_income_dataset.csv") data = data.drop(["fnlwgt","education","relationship","sex", "race"], axis=1)
After reading the dataset, the three variables considered irrelevant for the study are removed.
Next, the remaining qualitative variables are converted into their numerical form via the following code:
from sklearn.preprocessing import LabelEncoder enc = LabelEncoder() features_to_convert = ["workclass","marital-status","occupation","native-country","target"] for i in features_to_convert: Â Â data[i] = enc.fit_transform(data[i].astype('str'))
Once this is complete...