How to Use Decision Trees to Enhance K-Means Clustering
This chapter addresses two critical issues. First, we will explore how to implement k-means clustering with dataset volumes that exceed the capacity of the given algorithm. Second, we will implement decision trees that verify the results of an ML algorithm that surpasses human analytic capacity. We will also explore the use of random forests.
When facing such difficult problems, choosing the right model for the task often proves to be the most difficult task in ML. When we are given an unfamiliar set of features to represent, it can be a somewhat puzzling prospect. Then we have to get our hands dirty and try different models. An efficient estimator requires good datasets, which might change the course of the project.
This chapter builds on the k-means clustering (or KMC) program developed in Chapter 4, Optimizing Your Solutions with K-Means Clustering. The issue of large datasets is addressed. This exploration will...