What is the most time-consuming part of training deep networks?



Deep networks notoriously take a long time to train.

What is the most time-consuming aspect of training them? Is it the matrix multiplications? Is it the forward pass? Is it some component of the backward pass?


Posted 2018-04-01T19:50:28.670

Reputation: 537



Forward pass

The output of a layer can be calculated given the output of the previous layer. So the GPU can parallelize this computation for every layer and over the minibatch which is done by calculating a big matrix. But it needs to be sequential from layer to layer (earlier layers to higher layers). Regarding the layer type convolutions or especially fully connected layers can result in a big matrix calculation.

Backward pass

The gradient of a layer with respect to the layer input (and layer parameters) can only be calculated given the gradient of the layer output (input gradient of a subsequent layer) and input to the layer (output of the previous layer). This again can be parallelized over a layer and minibatch but is sequential from higher layers to earlier layers. Moreover, since the backward pass relies on the outputs of the forward pass all intermediate layer outputs of the forward pass have to be cached for the backward pass which results in a high (GPU) memory usage.

Forward and backward pass take most of the time

So, these two steps take a long time for 1 training iteration, and (depending on your network) high GPU memory usage. But you should read and understand the backpropagation algorithm that basically explains everything.

Moreover, to train a network from scratch, in general, takes lots of iterations because especially in the earlier layers training the parameters is based on gradients that are affected by lots of previous layers, which can result in noisy updates, etc., that do not always push the network parameters in the right direction directly. In contrast, e.g. fine-tuning a pre-trained network on some new task can for example already be done with much less training iterations.


Posted 2018-04-01T19:50:28.670

Reputation: 394


There is no such single hard and slow step in training neural networks , forward pass involves large number of matrix multiplications so does backward pass , even though there are highly optimized libraries for matrix multiplications neural networks act on very high dimensional (tensor) multiplications in both forward and backward passes which makes it difficult train . however backward pass would be even more slower or even intractable if we won't use backpropagation in case of large neural networks, since computing derivatives is time-taking.

refer training results in for exact number https://github.com/baidu-research/DeepBench#types-of-operations


Posted 2018-04-01T19:50:28.670

Reputation: 920