# Why is Bias-Variance Tradeoff important after all?

In evaluating a model, you should understand the concepts of bias, variance, and the tradeoff in minimizing them. Knowing how to handle these errors will help you build accurate models and avoid falling into the overfitting and underfitting traps. That is why the bias-variance tradeoff is a central problem in Supervised Learning.

The conflict arises when simultaneously minimizing these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set.

First of all, let’s define bias and variance:

## Bias

Bias is the difference between the model’s average prediction and the correct value you are trying to predict. It is also the erroneous assumptions in the learning algorithm. Models with significant bias do not even capture patterns in the training data and oversimplify the model.

## Variance

It is the variability of model prediction for a given point or value which provides you with how your data’s spread is. Models with colossal variance tend to mimic the training data and do not generalize well-enough on unseen data. In consequence, such models performed well on training data but poorly on test data.

Firstly, let’s dive a bit into the mathematics: suppose you want to predict $Y$ as a function of $f(X)$.

$Y = f(X)+ e$

Where $e$ is the error and it is normal distributed with mean zero.

Using any modeling technique, you try to estimate $Y$ with $\hat{Y}$, in this case, you can say that the expected squared error at a point x is:

$e(x)=E[(Y-\hat{Y})^2]$

This $e(x)$ can be further decomposed as:

$e(x) = (E[\hat(Y) - Y])^2 + E[(Y-\hat(Y))^2]+\sigma_e^2$

$e(x) = Bias^2 + Variance + Irreducible Error$

Creating good models do not reduce irreducible error. It measures the amount of randomness or noise in the data.

In underfitting conditions, the model has high bias and low variance. It also occurs when we have a short amount of data or the model’s structure cannot capture the nature of the data’s patterns.

In overfitting conditions, the model captures the randomness and the patterns, and it is said to mimic the training data. These kinds of models have low bias but high variance.