Chapter 2: Unsupervised Learning: Real-life Applications
Activity 3: Using Data Visualization to Aid the Preprocessing Process
- Load the previously downloaded dataset by using the Pandas function
read_csv()
. Store the dataset in a Pandas DataFrame nameddata
:import pandas as pd import matplotlib.pyplot as plt import numpy as np np.random.seed(0)
First, import the required libraries. Then, feed the dataset path to the Pandas function's
read_csv()
:data = pd.read_csv("datasets/wholesale_customers_data.csv")
- Check for missing values in your DataFrame. Using the
isnull()
function plus thesum()
function, count the missing values of the entire dataset at once:data.isnull().sum()
Figure 2.16: A screenshot showing the number of missing values in the DataFrame
As you can see from the preceding screenshot, there are no missing values in the dataset.
- Check for outliers in your DataFrame. Using the technique you learned in the previous chapter, label those values that...