Member-only story

Data Science: Clustering Data with DBSCAN.

applied.math.coding
5 min readMar 31, 2023

--

With an implementation in Rust.

This story is part of my Data Science series.

In this series of stories we already have seen some popular clustering algorithms, that is, K-means and its extension K-means++. There is one more that also ranks up into this level of popularity - DBSCAN. This algorithm and a possible implementation will be the focus of this story.

Algorithm:

DBSCAN is looking for dense regions of the data. This is done by picking a record p and searching all the records (neighbors) that are within a distance of not more than eps. If the number of neighbors surpasses a given amount min_neighbors_of_core, then p is considered a core.

In this sense, a core is a point which is closely surrounded by many other points.

p will be assigned a unique label which also gets assigned to all its neighbors.

In case not enough neighbors are found, then p is considered an outlier.

--

--

applied.math.coding
applied.math.coding

Written by applied.math.coding

I am a Software Developer - Java, Rust, SQL, TypeScript - with strong interest doing research in pure and applied Mathematics.

No responses yet

Write a response