# Data Science: Feature Selection by comparing Histogram Distances.

*This story is part of my **Data Science** series.*

Typically when we create a classification or regression model we try to keep it as simple as possible. Simplification can be reached in many ways, and one is to reduce the considered features to a small as possible subset.

In this small account I want to present you one rather less-known approach but that I think is very interesting in certain scenarios.

Classification problems are all about the following term:

`P(Y = 1 | X = d)`

That is, to estimate the probability `Y`

to take value `1`

given that an observed feature `X`

takes the value `d`

.

Since

`P(Y = 1 | X = d) ~ P(X = d | Y = 1)`

P(Y = 0| X = d) ~ P(X = d | Y = 0)

A variable `X`

distinguishes the outcome `1`

resp. `0`

of `X`

good, if the values of `P(X = d | Y = 1)`

and `P(X = d | Y = 0)`

are different from each other. Otherwise, if they were the same, then since

`P(Y = 1 | X = d) + P(Y = 0|…`