Is the derivative of the loss wrt a single scalar parameter proportional to the loss?


I am wondering about the correlation between the loss and the derivative of the loss wrt a single scalar parameter, with the same sample. That means: considering a machine learning model with parameters $\theta \in R$, I want to figure out the relationship between $Loss(x)$ and $\frac{\partial Loss(x)}{\partial \theta_i}$, where $i \in \{1,2,3,...,n\}$

Intuitively, I would like to consider that they are in a positive correlation, is it right? If it is right, how can I prove it in a mathematical way?


Posted 2020-04-03T12:01:55.227

Reputation: 21

1You need a function relating $x$ and $\theta$, otherwise your situation is not fully described. And typically in supervised learning loss is related to a comparison of transformed $x$ with a ground truth $y$ (although not always required) – Neil Slater – 2020-04-06T16:25:56.530

1For instance, for linear regression, you might have $\hat{y} = \theta \cdot x + b$ and $\mathcal{L}(\hat{y}, y) = \frac{1}{2}(\hat{y} - y)^2$ where $y$ is ground truth. For a concrete example you should [edit] in how $\theta$ is being used with $x$, and maybe the loss function you are using too. If you want a generic description/proof then you still need to specify how the loss function, $x$ and $\theta$ relate before someone could say whether correlation is always positive or not – Neil Slater – 2020-04-06T16:32:07.800

Correlation has a very strict definition. I don't think it is possible to do what you want according to that definition. You can only comment on positive or negative correlation, but not on the exact numerical value. Also if you have found the exact value of $\theta$ the correlation will be depend on the direction you move, since both increasing or decreasing it will increase loss. – DuttaA – 2020-04-07T01:40:42.343



The derivative $f'(x)$ is correlated with $f(x)$ in a certain sense. In fact, $f'(x)$ is a function of $f$, so we could even say that there's a cause-effect relationship.

The derivative at a specific point $c$ of the domain, i.e. $f'(c)$, can either be negative or positive. If $f'(c) > 0$, then $f(c)$ is increasing (with respect to an increase of $x$). If $f'(c) < 0$, then $f(c)$ decreasing (with respect to an increase of $x$).

This can easily be seen from an example. Consider $f(x) = x^2$, then $f'(x) = 2x$. Let $c = 2$, then $f'(2) = 4$, so the function is increasing. In fact, $f(1) = 2 \leq f(2) = 4 \leq f(3) = 9$. Similarly, let $c = -1$, then $f'(-1) = -2$, so the function is decreasing. In fact, $\leq f(-2) = 4 \geq f(-1) = 1 \geq f(0) = 0$ (note that the function is decreasing as we increase $x$!).

Consider a model with only one parameter, then the partial derivative of the loss function with respect to that parameter corresponds to the derivative of the loss function. So, the reasoning above applies to this model. What about a model with more than one parameter? The same thing happens.

If the function decreases, does its derivative also decrease? In general, no, and this can easily be seen from a plot of a function and its derivative. For example, consider a plot of a parabola and its derivative (which is a linear function).

enter image description here

On the left of the y-axis, the parabola is decreasing, but its derivative is increasing, while, on the right of the y-axis, the parabola is increasing and the linear function is still increasing.

This is the same thing with a loss function of an ML model and its partial derivative.


Posted 2020-04-03T12:01:55.227

Reputation: 19 783

Is $f'$ a function of $f$? I haven't heard it before...How is it formulated? – DuttaA – 2020-04-07T01:42:05.097


@DuttaA The function that computes $f'$ needs to have $f$ as input. This function is the differential operator. I am not completely sure about the mathematical formalism, but this should give you the idea and info to further investigate it!

– nbro – 2020-04-07T01:44:55.347