Exploring the Dataset
Real-life applications are crucial for cementing knowledge. Therefore, this chapter consists of a real-life case study involving a classification task, where the key steps that you learned in the previous chapter will be applied in order to select the best performing model.
To accomplish this, the Census Income Dataset will be used, which is available at the UC Irvine Machine Learning Repository.
Note
To download the dataset, visit http://archive.ics.uci.edu/ml/datasets/Census+Income.
Once you have located the repository, follow these steps to download the dataset:
First, click the Data Folder link.
For this chapter, the data available under adult.data will be used. Once you are inside of the link, you should be able to see the data.
Right-click it and select Save as.
Save it as a .csv file.
Note
Open the file and add header names over each column to make the pre-preprocessing easier. For instance, the first column should have the header Age, as per the features available...