Member-only story

Data Science: Clustering Data with DBSCAN.

applied.math.coding

5 min readMar 31, 2023

With an implementation in Rust.

Join Medium with my referral link - applied.math.coding

Get access to all of my stories and to thousands more on Medium from other writers. As my strong believe is, Medium is…

medium.com

This story is part of my Data Science series.

In this series of stories we already have seen some popular clustering algorithms, that is, K-means and its extension K-means++. There is one more that also ranks up into this level of popularity - DBSCAN. This algorithm and a possible implementation will be the focus of this story.

Algorithm:

DBSCAN is looking for dense regions of the data. This is done by picking a record p and searching all the records (neighbors) that are within a distance of not more than eps. If the number of neighbors surpasses a given amount min_neighbors_of_core, then p is considered a core.

In this sense, a core is a point which is closely surrounded by many other points.

p will be assigned a unique label which also gets assigned to all its neighbors.

In case not enough neighbors are found, then p is considered an outlier.

Data Science: Clustering Data with DBSCAN.

Join Medium with my referral link - applied.math.coding

Get access to all of my stories and to thousands more on Medium from other writers. As my strong believe is, Medium is…

Algorithm:

Written by applied.math.coding

No responses yet