You're reading from Python Data Cleaning Cookbook Modern techniques and Python tools to detect and remove dirty data and extract key insights

Product type Paperback

Published in Dec 2020

Publisher Packt

ISBN-13 9781800565661

Length 436 pages

Edition 1st Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Authors (2):

Michael B Walker

Michael Walker

View More author details

Table of Contents (12) Chapters

Preface

1. Chapter 1: Anticipating Data Cleaning Issues when Importing Tabular Data into pandas

2. Chapter 2: Anticipating Data Cleaning Issues when Importing HTML and JSON into pandas FREE CHAPTER

3. Chapter 3: Taking the Measure of Your Data

4. Chapter 4: Identifying Missing Values and Outliers in Subsets of Data

5. Chapter 5: Using Visualizations for the Identification of Unexpected Values

6. Chapter 6: Cleaning and Exploring Data with Series Operations

7. Chapter 7: Fixing Messy Data when Aggregating

8. Chapter 8: Addressing Data Issues When Combining DataFrames

9. Chapter 9: Tidying and Reshaping Data

10. Chapter 10: User-Defined Functions and Classes to Automate Data Cleaning

11. Other Books You May Enjoy

Leave a review - let other readers know what you think

Classes that handle non-tabular data structures

Data scientists increasingly receive non-tabular data, often in the form of JSON or XML files. The flexibility of JSON and XML allows organizations to capture complicated relationships between data items in one file. A one-to-many relationship stored in two tables in an enterprise data system can be represented well in JSON by a parent node for the one side and child nodes for data on the many side.

When we receive JSON data we often start by trying to normalize it. Indeed, we do that in a couple of recipes in this book. We try to recover the one-to-one and one-to-many relationships in the data obfuscated by the flexibility of JSON. But there is another way to work with such data, one that has many advantages.

Instead of normalizing the data, we can create a class that instantiates objects at the appropriate unit of analysis, and use the methods of the class to navigate the many side of one-to-many relationships. For example, if...