Well, GD terminates once the gradients are 0, right? Now, in a non-convex function, there could be some points, which do not belong to the global minima, and yet, have 0 gradients. For example, such points can belong to saddle points and local minima.

Consider this picture and say you start GD at the x label.

GD will bring you the flat area and will stop making progress there as gradients are 0. However, as you can see, global minima is to the left of this flat region.

By the same token, you have to show, for your own function, that there exists at least a single point whose gradients are 0 and yet, it is not the global minima.

In addition to that, the guarantee on converge for convex functions depends on annealing the learning rate appropriately. For example, if your LR is too high, GD can just keep overshooting the minima. The visualization from this page might help you to understand more regarding the behavior of GD.

In convex function with an unknown learning rate value (which is not optimal value of that as mentioned) how to ensure that it converges or not? – Mostafa Ghadimi – 2020-03-17T11:44:53.253

If the function is convex, GD converges with a decaying learning rate. I guess if you don't make any assumptions on the learning rate, it could keep overshooting the global minima. Say you're 1 units left of minima but due to big LR, you could go like 3 units right of minima, then back to 1 units left of minima and so on. – SpiderRico – 2020-03-17T11:47:58.487

how to find out whether LR has decaying rate or not? Is there any way to find out the oscillation? – Mostafa Ghadimi – 2020-03-17T23:05:43.903

well in your question, are you allowed to set the lr? if not, you have to look for auxiliary information. for example, if your training loss/function value jumps back and forth, this probably indicates that GD is overshooting its target due to big lr. – SpiderRico – 2020-03-17T23:11:37.863

I'm allowed but can't find the optimal value for lr due to time limit. – Mostafa Ghadimi – 2020-03-17T23:26:10.330

okay then you have to monitor the function value and decrease the learning rate when it starts to overshoot. also look into lr scheduling methods such as line search in general. – SpiderRico – 2020-03-17T23:29:08.150

please add it to your answer to confirm it. gimme upvote in case you want. thanks – Mostafa Ghadimi – 2020-03-17T23:37:34.027

Sure. I've added a link that visualizes how GD behaves on loss surfaces. You might find it useful. – SpiderRico – 2020-03-18T16:21:01.687