k-means Algorithm
The k-means algorithm is used for data without a labeled class. It involves dividing the data into K number of subgroups. The classification of data points into each group is done based on similarity, as explained before, which for this algorithm is measured by the distance from the center (centroid) of the cluster. The final output of the algorithm are the data points related to a cluster and the centroid of each cluster, which can be used to label new data in the same clusters.
The centroid of each cluster represents a collection of features that can be used to define the nature of the data points that belong there.
Understanding the Algorithm
The k-means algorithm works through an iterative process that involves the following steps:
Steps 2 and 3 are repeated in an iterative process, until a criterion is met. The criterion can be as follows:
The number of iterations defined.
The data points do not change from cluster...