Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Practical Data Wrangling

You're reading from   Practical Data Wrangling Expert techniques for transforming your raw data into a valuable source for analytics

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781787286139
Length 204 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Allan Visochek Allan Visochek
Author Profile Icon Allan Visochek
Allan Visochek
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. Programming with Data FREE CHAPTER 2. Introduction to Programming in Python 3. Reading, Exploring, and Modifying Data - Part I 4. Reading, Exploring, and Modifying Data - Part II 5. Manipulating Text Data - An Introduction to Regular Expressions 6. Cleaning Numerical Data - An Introduction to R and RStudio 7. Simplifying Data Manipulation with dplyr 8. Getting Data from the Web 9. Working with Large Datasets

Quantifying the existence of patterns


If you look through the addresses output from the previous step, you may notice that not all of them have a street address as outlined earlier.

A common deviation from the street address pattern is the addition of an N or S to a street name. Another deviation is initial street names that contain more than one word:

3649 N Southport Ave 4022 N Mozart St Irving Park 260-300 Osceola Ave S St Paul, MN 55102, USA

103 & 105 Misty Morning Way Savannah, Georgia 1656 Mount Eagle Place Alexandria, Virginia

Yet another deviation is the omission of street numbers:

West Outer Drive Dearborn, Michigan Crown St New Haven, CT, USA

Depending on the project, you will usually need to decide how far to go to capture all of the variations in the data. The more complex the pattern, the more work it will take to capture.

Due to this trade-off, it is helpful to quantify how much of the data is captured by a particular pattern. In the next few subsections, I will walk through the...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image