40

33

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable.

I have learnt that there is a hypothesis, which is:

$h_\theta(x)=\theta_0+\theta_1x$

To find out good values for the parameters $\theta_0$ and $\theta_1$ we want to minimize the difference between the calculated result and the actual result of our test data. So we subtract

$h_\theta(x^{(i)})-y^{(i)}$

for all $i$ from $1$ to $m$. Hence we calculate the sum over this difference and then calculate the average by multiplying the sum by $\frac{1}{m}$. So far, so good. This would result in:

$\frac{1}{m}\sum_{i=1}^mh_\theta(x^{(i)})-y^{(i)}$

But this is not what has been suggested. Instead the course suggests to take the square value of the difference, and to multiply by $\frac{1}{2m}$. So the formula is:

$\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$

Why is that? Why do we use the square function here, and why do we multiply by $\frac{1}{2m}$ instead of $\frac{1}{m}$?

1

Related question at stats.stackexchange.com

– user1205197 – 2017-01-01T04:17:31.710Also take a look at Chris McCormick's explanation on https://goo.gl/VNiUR5

– vimdude – 2017-08-15T23:12:28.393