When working with data, one of the most common issues a data scientist will run into is the problem of missing data. Most commonly, this refers to empty cells (row/column intersections) where the data just was not acquired for whatever reason. This can become a problem for many reasons; notably, when applying learning algorithms to data with missing values, most (not all) algorithms are not able to cope with missing values.
For this reason, data scientists and machine learning engineers have many tricks and tips on how to deal with this problem. Although there are many variations of methodologies, the two major ways in which we can deal with missing data are:
- Remove rows with missing values in them
- Impute (fill in) missing values
Each method will clean our dataset to a point where a learning algorithm can handle it, but each method...