6. Learning the Hidden Secrets of Data Wrangling
Activity 6.01: Handling Outliers and Missing Data
Solution:
The steps to completing this activity are as follows:
Note
The dataset to be used for this activity can be found at https://packt.live/2YajrLJ.
- Load the data:
import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline
- Read the
.csv
file:df = pd.read_csv("../datasets/visit_data.csv")
Note
Don't forget to change the path (highlighted) based on where the CSV file is saved on your system.
- Print the data from the DataFrame:
df.head()
The output is as follows:
As we can see, there is data where some values are missing, and if we examine this, we will see some outliers.
- Check for duplicates by using the following command:
print("First name is duplicated - {}"\ .format(any(df.first_name.duplicated()))) print("Last name is...