That would be the case if the steps taken by gradient descent are "infinitesimal" in the mathematical sense. But in fact, it takes steps with some finite length defined by the learning rate. But the problem is, the gradient is calculated at each point, and does not really calculate how the slope changes at some finite length later. If you choose a large learning rate, you may "offshoot" from the optimal direction as shown in the figure. If you choose a small enough learning rate those oscillations will be minimal and you may move "almost" perpendicular to the contour lines and look like what you describe in the question, but it will take a very long time to complete training.

The second part of the question refers to the section titled "**Conjugate Gradients**" and it refers to a specific optimization method. The reason for the perpendicular lines in the second part is because of the vanishing gradient at the turning points. Quoting the text:

The method of steepest descent involves jumping to the point of lowest
cost along the line deﬁned by the gradient at the initial point on each
step

Such a point is where the "directional derivative" with respect to theta, theta being the parameter in a given direction vanishes. When the derivative vanishes with respect to the parameter of the direction, The only possible way to go will be in a perpendicular direction. That is why they are called "**conjugate**" to each other, hence the name of the optimization method. This is detailed in the text above the figure.

To summarize the situation for that figure; An initial direction is chosen at the starting point by calculating the gradient there, and you keep moving in that direction until the gradient vanishes. That point is the minimum in that direction, so if you go any further on that straight line, the cost function will increase. The only direction you can go in which the cost function does not increase can be perpendicular to the initial direction.

To visualize this, look at image 4.5 in your first upload - page 88. Starting from that top of the saddle and following the gradient, you go down to the middle of the saddle. At that point going any further in the same direction will only take you up, while you want to go down. The only way to achieve this is to take a 90 degree turn at the middle of the saddle.

Thank you for the answer. But I believe that despite the fact steps are not "infinitesimal" the angle at the point should belong constant and should look more like this: https://imgur.com/a/NLn5hak Am I wrong?

– MajorTom – 2019-11-19T15:43:52.777@MajorTom If your starting point is not on one of the lines I have drawn in this figure, the direction with the steepest descent will not point to the origin. This will also be the case after every epoch, you need to be on those lines to get to the center fastest. Any other point will make a zig-zag pattern, as the perpendicular to the ellipse there will not point towards origin. imgur.com/a/wDrTOZR – serali 8 mins ago Delete – serali – 2019-11-19T16:12:11.107