Pages

October 30, 2014

Reducible and Irreducible Errors

Any time we fit a model to data, we introduce two types of errors.

Suppose the actual relationship between the target variable $Y$ and the predictors $X$ is $Y = f(X) + \epsilon$. When we come up with a model $\hat{Y} = \hat{f}(X)$, we introduce reducible error by estimating $f$ with $\hat{f}$ as well as irreducible error by ignoring $\epsilon$.

Of course, we ignore $\epsilon$ because we have no choice. It is usually assumed to be independent of the predictors so that we can't do anything about it and we just content ourselves with the fact that it has mean zero. This type of error is called irreducible because it is rooted in various factors that we can't affect. These can come from relevant predictors that weren't measured or from inherent unmeasurable variation in the target variable.

On the other hand, there is something we can do about reducible error. In practice, we don't know the true relationship between $X$ and $Y$, so $\hat{f}$ is only an estimate for $f$. Here we can suffer from model bias where the model we choose to fit to the data is an inaccurate representation of the true relationship. Even if we know the true form of $f$, we still need to estimate the parameters. We can, however, quantify our error through confidence intervals and various model statistics.