DBSCAN Algorithm
The density-based spatial clustering of applications with noise (DBSCAN) algorithm groups together points that are close to each other (with many neighbors) and marks those points that are further away with no close neighbors as outliers.
According to this, and as its name states, the algorithm classifies data points based on the density of all data points in the data space.
Understanding the Algorithm
The DBSCAN algorithm requires two main parameters: epsilon and the minimum number of observations.
Epsilon, also known as eps, is the maximum distance that defines the radius within which the algorithm searches for neighbors. The minimum number of observations, on the other hand, refers to the number of data points required to form a high density area (min_samples). However, the latter is optional in scikit-learn as the default value is set to 5:
In the preceding diagram, the blue dots are assigned...