Data Science: Reducing complexity by Principle Component Analysis.

3 min readApr 14

This story is part of my Data Science series.

Many data sets come with an enormous amount of feature variables. On the one hand this is positive because it yields a large amount of information that we can incorporate in a machine learning algorithm, but on the other hand it is a well-known fact — based on experience — that models with less complexity perform better than those using a huge amount of information.

The entire area of ‘feature selection’ is concerned about reducing complexity and extracting those features that most contribute to predictability of the outcome variable. In addition it is favorable to avoid the use of strongly correlated variables within the same model. Though what ‘correlated’ exactly means, typically depends on the context.

In this story we will look at a prominent way, i.e. principle component analysis, to extract the ruling features from the data.

Prerequisite to understand what follows is some basic knowledge of multivariate calculus and the rule of Lagrange multipliers (see here).