Analyzing the data
In this part, we will do two types of unsupervised data analysis. We first use principal component analysis (PCA) to create a high-level visualization of the whole data. Next, after having been informed how many clusters are possibly among the data objects, we will use K-Means to form the clusters and study them. Let's start with PCA.
Using PCA to visualize the dataset
As we already know, PCA can transform the dataset, so most of the information is presented in the first few principal components (PCs). Our investigation showed that the majority of relationships between the attributes, including county_df
, is linear, which is allowing us to be able to use PCA; however, we won't forget about the few non-linear relationships as we move ahead with PCA, and we will not rely too much on the results of the PCA.
The following screenshot shows a three-dimensional (3D) scatterplot of PC1, PC2, and PC3. PC1 and PC2 are visualized using the x and y axes, whereas...