Replacing outliers
Another approach that can be considered to handle outliers is replacing the extreme values with a predetermined value. Just like the removal of outliers, this needs to be done with utmost care because it can introduce bias into our dataset. Flooring and capping are also forms of replacing outliers. However, in this recipe, we will focus on other methods:
- Statistical measures: This involves replacing outliers with the mean, median, or percentiles of the dataset
- Interpolation: This involves estimating the value of an outlier using the neighboring data points of the outlier
- Model-based methods: These involve using a machine learning model to predict the replacement value for the outliers
It is important to note that the preceding methods will affect the shape and characteristics of the dataset distribution, and they are not appropriate in scenarios where the distribution of the data is important.
We will explore how to replace outliers using...