**FULL GRU Unit**

$ \tilde{c}_t = \tanh(W_c [G_r * c_{t-1}, x_t ] + b_c) $

$ G_u = \sigma(W_u [ c_{t-1}, x_t ] + b_u) $

$ G_r = \sigma(W_r [ c_{t-1}, x_t ] + b_r) $

$ c_t = G_u * \tilde{c}_t + (1 - G_u) * c_{t-1} $

$ a_t = c_t $

**LSTM Unit**

$ \tilde{c}_t = \tanh(W_c [ a_{t-1}, x_t ] + b_c) $

$ G_u = \sigma(W_u [ a_{t-1}, x_t ] + b_u) $

$ G_f = \sigma(W_f [ a_{t-1}, x_t ] + b_f) $

$ G_o = \sigma(W_o [ a_{t-1}, x_t ] + b_o) $

$ c_t = G_u * \tilde{c}_t + G_f * c_{t-1} $

$ a_t = G_o * tanh(c_t) $

As can be seen from the equations LSTMs have a separate update gate and forget gate. This clearly makes LSTMs more sophisticated but at the same time more complex as well. There is no simple way to decide which to use for your particular use case. You always have to do trial and error to test the performance. However, because GRU is simpler than LSTM, GRUs will take much less time to train and are more efficient.

Credits:Andrew Ng

1

A GRU is slightly less complex but is approximately as good as an LSTM performance-wise. An implementation in TensorFlow is found here: https://www.data-blogger.com/2017/08/27/gru-implementation-tensorflow/.

– www.data-blogger.com – 2017-08-27T12:28:24.723GRUs are generally used when you do have long sequence training samples and you want a quick and decent accuracy and maybe in cases where infrastructure is an issue. LSTMs are preferred when sequence lengths are more and some good context is there. LSTMs when trained with more data give you better results than GRUs. – Subir Verma – 2020-10-18T08:03:00.510