Clustering methods are designed to find hidden patterns or groupings in a dataset. Unlike the supervised learning methods covered in previous chapters, these algorithms identify a grouping without any label to learn from through the selection of clusters based on similarities between elements.
This is an unsupervised learning technique that groups statistical units to minimize the intragroup distance and maximize the intergroup distance. The distance between the groups is quantified by means of similarity/dissimilarity measures defined between the statistical units.
To perform cluster analysis, no prior interpretative model is required. In fact, unlike other multivariate statistical techniques, this one does not make an apriori assumption on the existing fundamental typologies that may characterize the observed sample...