3

I understand that RMSE is just the square root of MSE. Generally, as far as I have seen, people seem to use MSE as a loss function and RMSE for evaluation purposes, since it exactly gives you the error as a distance in the Euclidean space.

**What could be a major difference between using MSE and RMSE when used as loss functions for training?**

I'm curious because good frameworks like PyTorch, Keras, etc. don't provide RMSE loss functions out of the box. Is it some kind of standard convention? If so, why?

Also, I'm aware of the difference that MSE magnifies the errors with magnitude>1 and shrinks the errors with magnitude<1 (on a quadratic scale), which RMSE doesn't do.

1.) Ease of derivative. 2.) Don't have to worry about ~0 in denominator causing huge gradient 3.) But to me the most important is mathematical convenience, someone might easily make the mistake of RMSE is just equal the difference $y-y'$ instead of root of mean square of $y-y'$. The answer this might start depending on conventions. 4.) In maths (Don't know the reason and might be inaccurate) we mainly work with variances instead of standard deviation. – DuttaA – 2019-08-31T17:37:53.037

Thanks @DuttaA , I think the comment you've given is quite good, that it can be one of the answers to this question. So, please post it as an answer below :) – Gokul NC – 2019-09-01T05:10:28.623

@nbro I thought that having the question title as "RMSE vs MSE" would be more SEO optimized since that's how most of the people search, anyway thanks for the edit. :) – Gokul NC – 2019-09-01T05:12:58.347