## Bias Formula in Machine Learning expanded using ground truth

2

Why is Bias calculated for $$f(x)$$? Shouldn't it be calculated for $$Y$$ (which is $$f(x)$$ + Noise $$\epsilon$$)?

We are fitting our model to $$Y$$, So shouldn't we be calculating bias wrt to $$Y$$?

Also, I tried to calculate bias for different polynomial fitting with degree $$d=3,9,20$$ using $$f(x)$$. I got the same bias. I got expected bias values when I calculated using $$Y$$.

But my calculation doesn't match with the actual formula. Basically my doubt is: Why is Bias formula $$E[\hat{f}(x_0) - f(x_0)]$$ and not $$E[\hat{f}(x_0) - Y]$$?

Is it because noise is accounted for separately using variance (Irreducible error)? If suppose I haven't separately accounted for noise, then I can calculate bias using $$Y$$?

Bias should be calculated wrt to the Ground Truth right? According to my understanding ground truth is $$Y$$ which is $$f(x) + \epsilon$$. Please correct me if am wrong. Expected value of noise is zero. – Media – 2020-02-25T21:07:18.770

Good point. But why does the overall bias value change when I use Y and f(x)? – Selvam – 2020-02-25T21:20:40.400

$f(x_{hat})$ is the output of the model you've provided. By the way, the real label, $f(x)$, that we provide to our system always has noise due to lack of precise measures or due to being a problem which does not always obey a real distribution. It's common. – Media – 2020-02-26T07:37:02.507

f hat(x0) is the output provided by model. f(x) cannot be the output provided by the model. What is f hat(x0) then? – Selvam – 2020-02-26T07:38:59.440

@Selvem edited. – Media – 2020-02-26T07:40:17.840

But what is y then? My understanding was "y" is the real label which can contain noise. Isn't it so? Isn't it what you are talking about? – Selvam – 2020-02-26T07:42:27.807

$Y$ is considered the real label of a phenomenon without noise. $f(x)$ is what we have which is measured or assigned label by expert. – Media – 2020-02-26T07:45:13.733

But when we calculate bias with respect to f(x), I get a fixed value for all increasing degree polynomial fitting models. f(x) we have taken as some known function and added some noise to it to get Y. So, In our case, Shouldn't I be calculating bias wrt Y? – Selvam – 2020-02-26T07:49:30.500

The point is that you don't have $Y$. What you have is $f(x)$. At least I've never attempted to do what you referred due to the point I referred. – Media – 2020-02-26T07:51:13.127

Oh. I get what you're explaining. Thanks. But then why are we adding the error term to Y and not f(x)? Shouldn't it be f(x) = Y + E? – Selvam – 2020-02-26T07:52:57.493

https://machinelearningmastery.com/how-machine-learning-algorithms-work/ Please check this? It clearly says f(x) is the true label. Y is the label given by experts which has errors. – Selvam – 2020-02-26T07:55:04.453

Not really, the noise is added to $Y$, and if you've notice, the formulas are trying to find the answer by employing $Y$. We do not have $Y$ so we use $f(x)$ alongside the error term. – Media – 2020-02-26T07:55:32.070

About your link, I didn't check it, but it may be due to notation mismatch. At least this is what I know. Others may have better opinions. – Media – 2020-02-26T07:56:10.707