Sitemap

Data Science: Implementing a Classification Tree in Rust using Polars.

4 min readMar 13, 2023
Press enter or click to view image in full size

This story is part of my Data Science series.

In our previous story (see here), we have seen a prototype implementation of a regression tree. Beside decision trees can be used for regression tasks, they also are a suitable tool for classifications. This can be achieved by a slight change in the algorithm.

Remember, the value we attributed to a region of feasible data was taken to be the mean value of outcomes Y of records falling into this region. This value turned out to minimize the variance of the predicted value.

For classification tasks, a measure like this makes no sense because the outcome is a categorical variable, that is, takes discrete values.

Gini impurity:

Let us assume the outcome variable Y takes values within the set

K := {1, 2, ..., n}

Obviously, if we are given a region R, the value we would attribute to this region is the one with highest probably:

--

--

applied.math.coding
applied.math.coding

Written by applied.math.coding

I am a Software Developer - Java, Rust, SQL, TypeScript - with strong interest doing research in pure and applied Mathematics.

No responses yet