1

3

I'm reading this article Understanding the BiasVariance Tradeoff. It mentioned:

If we denote the variable we are trying to predict as $Y$ and our covariates as $X$, we may assume that there is a relationship relating one to the other such as $Y=f(X)+\epsilon$ where the error term $\epsilon$ is normally distributed with a mean of zero like so $\epsilon\sim\mathcal{N}(0,\,\sigma_\epsilon)$.

We may estimate a model $\hat{f}(X)$ of $f(X)$. The expected squared prediction error at a point $x$ is: $$Err(x)=E[(Y-\hat{f}(x))^2]$$ This error may then be decomposed into bias and variance components: $$Err(x)=(E[\hat{f}(x)]-f(x))^2+E\big[(\hat{f}(x)-E[\hat{f}(x)])^2\big]+\sigma^2_e$$ $$Err(x)=Bias^2+Variance+Irreducible\ Error$$

**I'm wondering how do the last two equations deduct from the first equation?**

Could you please explain a bit more on why the third term is 0? Why $E[Y]=f(x)$ leads the third term to be 0? How to calculate the third term? – CyberPlayerOne – 2018-06-15T10:12:40.443

I have edited the answer. – David Masip – 2018-06-15T14:16:44.067

Since ϵ is the noise, I think the noise should be independent from the model or the data. So the independence you assumed should be valid. – CyberPlayerOne – 2018-06-15T17:35:36.310

Indeed, I think this should be it. Good work – David Masip – 2018-06-15T20:54:29.860

The third term is zero because: 1. As someone pointed out, the noise is independent of the data. 2. The noise is assumed to be normally distributed with mean zero. – Velu44 – 2019-07-11T10:11:13.350