So what's the catch with LSTM?



I am expanding my knowledge of the Keras package and I have been tooling with some of the available models. I have an NLP binary classification problem that I'm trying to solve and have been applying different models.

After working with some results and reading more and more about LSTM, it seems like this approach is far superior to anything else I've tried (across multiple datasets). I keep thinking to myself, "why/when would you not use LSTM?". The use of the additional gates, inherent to LSTM, makes perfect sense to me after having some models that suffer from vanishing gradients.

So what's the catch with LSTM? Where do they not do so well? I know there is no such thing as a "one size fits all" algorithm, so there must be a downside to LSTM.


Posted 2018-02-02T15:45:12.373

Reputation: 1 946

Try GRU, they are like LSTM but require less memory and train faster. – Vivek Khetan – 2018-02-03T16:58:21.667



You are right that LSTMs work very well for some problems, but some of the drawbacks are:

  • LSTMs take longer to train
  • LSTMs require more memory to train
  • LSTMs are easy to overfit
  • Dropout is much harder to implement in LSTMs
  • LSTMs are sensitive to different random weight initializations

These are in comparison to a simpler model like a 1D conv net, for example.

The first three items are because LSTMs have more parameters.


Posted 2018-02-02T15:45:12.373

Reputation: 2 301

3Agreed, and I think overfitting (aka poor generalization) is perhaps the biggest risk. Make sure you have a good strategy for doing model validation. – tom – 2018-02-02T20:27:48.030