Member-only story
Data Science: Clustering Data with DBSCAN.
With an implementation in Rust.

This story is part of my Data Science series.
In this series of stories we already have seen some popular clustering algorithms, that is, K-means and its extension K-means++. There is one more that also ranks up into this level of popularity - DBSCAN. This algorithm and a possible implementation will be the focus of this story.
Algorithm:
DBSCAN is looking for dense regions of the data. This is done by picking a record p
and searching all the records (neighbors) that are within a distance of not more than eps
. If the number of neighbors surpasses a given amount min_neighbors_of_core
, then p
is considered a core.
In this sense, a core is a point which is closely surrounded by many other points.
p
will be assigned a unique label which also gets assigned to all its neighbors.
In case not enough neighbors are found, then p
is considered an outlier.