# Data Science: Feature Selection by comparing Histogram Distances.

This story is part of my Data Science series.

Typically when we create a classification or regression model we try to keep it as simple as possible. Simplification can be reached in many ways, and one is to reduce the considered features to a small as possible subset.

In this small account I want to present you one rather less-known approach but that I think is very interesting in certain scenarios.

Classification problems are all about the following term:

`P(Y = 1 | X = d)`

That is, to estimate the probability `Y` to take value `1` given that an observed feature `X` takes the value `d`.

Since

`P(Y = 1 | X = d) ~ P(X = d | Y = 1)P(Y = 0| X = d) ~ P(X = d | Y = 0)`

A variable `X` distinguishes the outcome `1` resp. `0` of `X` good, if the values of `P(X = d | Y = 1)` and `P(X = d | Y = 0)` are different from each other. Otherwise, if they were the same, then since

`P(Y = 1 | X = d) + P(Y = 0|…`

--

--

I am a Software Developer - Java, Rust, SQL, TypeScript - with strong interest doing research in pure and applied Mathematics.