Implementing k-means clustering
We can use k-means with some of the same data that we used with the supervised learning models that we developed in earlier chapters. The difference is that there is no longer a target for us to predict. Rather, we are interested in how certain instances hang together. Think of how people arrange themselves in groups during a stereotypical high school lunch break and you kind of get a general idea.
We also need to do much of the same preprocessing work that we did with supervised learning models. We will start with that in this section. We will work with data on income gaps between women and men, labor force participation rates, educational attainment, teenage birth frequency, and female participation in politics at the highest level.
Note
The income gap dataset is made available for public use by the United Nations Development Program at https://www.kaggle.com/datasets/undp/human-development. There is one record per country with aggregate employment...