# Data Science: Implementing a Naive Bayesian Classifier using the Empirical Density.

This story is part of my Data Science series.

In my previous story (see here), I have provided an example implementation for a random forest based on a classification problem stemming from this set of data.

As one further approach for solving this problem I want to use a Naive Bayes classifier which implementation shall be the focus of this story.

In general, since the data set is quite large, the Naive Bayes approach is a good choice despite of its rather hard assumptions imposed on the data. You may find some more details on that in the theoretic outline here.

As a quick reminder, the Naive Bayes is based on the following relation of conditional probabilities:

`P(C | X) = P(C) / P(X) * P(X | C)`

where `C` is the categorical outcome random variable and `X` the feature random variable.

As we see, in order to find an estimate for the outcome to be `1` based on the condition the feature to take value `x`, we need an estimate for

`P(X = x | C = 1)`

A typical approach here is to assume each component of `X` is driven by a Gaussian distribution. But because this is another hard assumption on the data, I will follow a different path of estimating the empirical conditional density of

`P(X | C = 0)andP(X | C = 1)`

This is easier than it sounds and can be accomplished by just computing a histogram on the sample of `X`.

## Implementation:

The first thing one should do for this task is to normalize the data to the interval `[-1, 1]`. Therefore, for each component of `X`, I do compute a normalization factor by using `1 / (max(X) — min(X))`. You can use various methods to obtain these values. For me, since I do store the data on a DuckDB this looks like so:

`pub fn compute_norm_factors() -> Vec<f64> {    let con =…`

--

--

I am a Software Developer - Rust, Java, Python, TypeScript, SQL - with strong interest doing research in pure and applied Mathematics.