Dealing with outliers
You are not on this studying journey just to pass the AWS Machine Learning Specialty exam but also to become a better data scientist. There are many different ways to look at the outlier problem purely from a mathematical perspective; however, the datasets used in real life are derived from the underlying business process, so you must include a business perspective during an outlier analysis.
An outlier is an atypical data point in a set of data. For example, Figure 4.8 shows some data points that have been plotted in a two-dimension plan; that is, x and y. The red point is an outlier since it is an atypical value in this series of data.
Figure 4.8 – Identifying an outlier
It is important to treat outlier values because some statistical methods are impacted by them. Still, in Figure 4.8, you can see this behavior in action. On the left-hand side, there has been drawn a line that best fits those data points, ignoring...