Data Science: Implementing a Classification Tree in Rust using Polars.

applied.math.coding
4 min readMar 13, 2023

This story is part of my Data Science series.

In our previous story (see here), we have seen a prototype implementation of a regression tree. Beside decision trees can be used for regression tasks, they also are a suitable tool for classifications. This can be achieved by a slight change in the algorithm.

Remember, the value we attributed to a region of feasible data was taken to be the mean value of outcomes Y of records falling into this region. This value turned out to minimize the variance of the predicted value.

For classification tasks, a measure like this makes no sense because the outcome is a categorical variable, that is, takes discrete values.

Gini impurity:

Let us assume the outcome variable Y takes values within the set

K := {1, 2, ..., n}

Obviously, if we are given a region R, the value we would attribute to this region is the one with highest probably:

--

--

applied.math.coding

I am a Software Developer - Java, Rust, SQL, TypeScript - with strong interest doing research in pure and applied Mathematics.