Imputation of missing data
Imputation is often used when removing missing records would result in significant information loss. Imputation involves filling in missing values with estimated or calculated values. Common imputation methods include mean, median, and mode imputation, or using more advanced techniques.
Let’s have a look at the different imputation methods for our scenario.
Mean imputation
Mean imputation fills missing values with the mean of the observed values in the variable. It is a very simple method, and it does not introduce bias when the values missing are completely random. However, this method is sensitive to outliers, and it may distort the distribution of the feature. You can find the code for this part in the repo at https://github.com/PacktPublishing/Python-Data-Cleaning-and-Preparation-Best-Practices/blob/main/chapter08/3.mean_imputation.py.
Let’s see the code example for mean imputation. For this example, we will use the same dataset...