Handling outliers
An outlier is a data point that is far away and not similar to all the other data points in a sample:
Outliers can be detected using graphical (box plots) and not graphical methods, with the graphical methods being more intuitive. Let's talk about the non-graphical statistical methods:
- Tukey or percentiles
- Z-score
- Modified z-score
To show how Optimus can handle outliers, let's create a dataset with positive and negative extrema while considering all the data. Their values will be between 40 and -50:
df = op.create.dataframe( Â Â Â Â Â Â Â Â Â {"A":[1,2,3,45,6,-50,np.nan], Â Â Â Â Â Â Â Â Â Â "B":["Optimus","Bumblebee","Eject","Optimus", Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â "...