Data Science: Ensemble Learning with Bagging.

applied.math.coding
5 min readJun 20, 2023

This story is part of my Data Science series.

Bagging is a useful technique to reduce variance in predictions wherever it is regression or classification.

If you train a machine learning algorithm on some data, you probably will be asked how stable all these metrics like recall, precision, accuracy, etc. are.

A typical way to answer this, is by re-training several times the same model on a random-selected subset of the data. This gives you a list of the aforementioned metrics, and you can gain insight of how stable (variance) these values are.

If it turns out these values are quite unstable, before throwing away the model, sometimes a remedy is bagging:

The idea of bagging is to use the average prediction that all the above trained models produce together.

If the random-selected samples are not too large in size compared to the entire training data, then we can assume that the predicted values of all the models are independent from each other. Moreover, we can assume that they…

--

--

applied.math.coding

I am a Software Developer - Java, Rust, SQL, TypeScript - with strong interest doing research in pure and applied Mathematics.