Summary
Data problems where the input data is unrelated to a labeled output is handled using unsupervised learning. The main objective of such data problems is to understand the data by finding patterns that, in some cases, can be generalized to new instances. In this context, this chapter covered clustering algorithms, which work by aggregating similar data points into clusters, while separating data points that greatly differ. After this, the chapter covered data visualization tools that can be used to analyze problematic features during data preprocessing. We also saw how to apply different algorithms to the dataset and compare their performance to choose the one that best fits the data. Two different metrics for performance evaluation, the Silhouette Coefficient metric and the Calinski-Harabasz index, were also discussed in light of the inability to represent all of the features in a plot, and thereby graphically evaluate performance on scikit-learn. However, it is important to understand...