Repurposing Data Sources
All language is but a poor translation.
–Franz Kafka
Sometimes, data lives in formats that take extra work to ingest. For common and explicitly data-oriented formats, common libraries already have readers built into them. Data frame libraries, for example, read a huge number of different file types. At worst, slightly less common formats have their own more specialized libraries that provide a relatively straightforward path between the original format and the general purpose data processing library you wish to use.
A greater difficulty often arises because a given format is not per se a data format, but exists for a different purpose. Nonetheless, often there is data somehow embedded or encoded in the format that we would like to utilize. For example, web pages are generally designed for human readers and rendered by web browsers with “quirks modes” that deal with not-quite-HTML, as is often needed. Portable...