Applying outlier detection in practice
In this section, we will take a look at a practical example of outlier detection using a public dataset describing the physicochemical properties of wine. This dataset is available for download from the University of California Irvine (UCI) repository (https://archive.ics.uci.edu/ml/datasets/wine+quality).
The wine dataset is composed of two CSV files: one describing the physicochemical properties of white wine, the other those of red wine. In this walk-through, we will be focusing on the white wine dataset, but you are welcome to use the data for red wine as well since most of the steps described in this chapter should be applicable to both.
First let's import the dataset into our Elasticsearch cluster using the Data Visualizer tool, which you can find under the Machine Learning app in Kibana. We will make an index for the white wine dataset and call it winequality-white
: