Implementation of the k-means clustering algorithm
We will now implement the k-means clustering algorithm. It takes a CSV file as input with one data item per line. A data item is converted into a point. The algorithms classify these points into the specified number of clusters. In the end, the clusters are visualized on a graph using the matplotlib
library:
# source_code/5/k-means_clustering.py import math import imp import sys import matplotlib.pyplot as plt import matplotlib import sys sys.path.append('../common') import common # noqa matplotlib.style.use('ggplot') # Returns k initial centroids for the given points. def choose_init_centroids(points, k): centroids = [] centroids.append(points[0]) while len(centroids) < k: # Find the centroid that with the greatest possible distance # to the closest already chosen centroid. candidate = points[0] candidate_dist = min_dist(points[0], centroids) for point in points: dist =...