Cleaning and Processing Data
Some automated tasks will require dealing with large amounts of data. As data grows, two new and distinct problems appear. Processing the task takes too long and input data quality issues cause more problems.
Both problems are well known in the realm of data science dealing with big quantities of data, but the problems can appear even at a smaller scale.
The quality of input data is highly related to the number of sources of the data. In general, data from a single source will be more consistent, but using a single source is limiting. Even if the data comes from the same source, it could still contain inconsistencies or errors.
Some examples of differences could be regional, such as date formats or currencies, extra information, different names for the same concept (including spelling differences), typos, general bad quality of data with errors… The list is huge!
To compare apples with apples, the input data...