Data Science: Ensemble Learning with Bagging.

applied.math.coding
5 min readJun 20, 2023

This story is part of my Data Science series.

Bagging is a useful technique to reduce variance in predictions wherever it is regression or classification.

If you train a machine learning algorithm on some data, you probably will be asked how stable all these metrics like recall, precision, accuracy, etc. are.

A typical way to answer this, is by re-training several times the same model on a random-selected subset of the data. This gives you a list of the aforementioned metrics, and you can gain insight of how stable (variance) these values are.

If it turns out these values are quite unstable, before throwing away the model, sometimes a remedy is bagging:

The idea of bagging is to use the average prediction that all the above trained models produce together.

If the random-selected samples are not too large in size compared to the entire training data, then we can assume that the predicted values of all the models are independent from each other. Moreover, we can assume that they all follow the same distribution. With other words, the predictions X_i i = 1,...,n viewed as random variables, are independent and identical distributed.

Next let us look at a possible implementation in Rust.

Implementation:

The data we are looking at are intended for a classification task and available here.

It doesn’t mind to this article how the data are stored or fetched. Let us just assume the following method brings all the data in form of a matrix where the first column presents the binary outcome and the remaining are continuous…

--

--

applied.math.coding

I am a Software Developer - Rust, Java, Python, TypeScript, SQL - with strong interest doing research in pure and applied Mathematics.