Closure
What You Know
This book hopes to have shown you a good range of the techniques you will need in preparing data for analysis and modeling. We addressed most of the most common data formats that you will encounter in your daily work. Hopefully, even if you use file or data formats this book could not specifically address, or even did not have the opportunity to mention, the general concepts and principles laid out will still apply. Only some libraries and interface details will vary. Particular formats can have particular pitfalls in the ways they facilitate data errors, but, obviously, data can go bad in numerous ways independent of representations and storage technologies.
Chapters 1, 2, and 3, respectively, looked at tabular, hierarchical, and “special” data sources. We saw specific tools and specific techniques for moving data from each of those sources into the tidy formats that are most useful for data science. Most of the examples shown used Python...