Member-only story
Data Science: Implementing a Classification Tree in Rust using Polars.
This story is part of my Data Science series.
In our previous story (see here), we have seen a prototype implementation of a regression tree. Beside decision trees can be used for regression tasks, they also are a suitable tool for classifications. This can be achieved by a slight change in the algorithm.
Remember, the value we attributed to a region of feasible data was taken to be the mean value of outcomes Y
of records falling into this region. This value turned out to minimize the variance of the predicted value.
For classification tasks, a measure like this makes no sense because the outcome is a categorical variable, that is, takes discrete values.
Gini impurity:
Let us assume the outcome variable Y
takes values within the set
K := {1, 2, ..., n}
Obviously, if we are given a region R
, the value we would attribute to this region is the one with highest probably: