How do the current input and the output of the previous time step get combined in an LSTM?

1

I am currently looking into LSTMs. I found this nice blog post, which is already very helpful, but still, there are things I don't understand, mostly because of the collapsed layers.

  • The input $X_t$, and the output of the previous time step $H_{t-1}$, how do they get combined? Multiplied, added or what?
  • The input weights and the weights of the input of the previous time step, those are just the weights of the connections between the time-steps/units, right?

Ben

Posted 2018-09-02T08:24:21.083

Reputation: 307

All i would like to say head over to this video https://www.youtube.com/watch?v=IEbBIpP4c9E&index=10&list=PLZnyIsit9AM7yeTZuBmezKNc6hFHUPImh by Andrew Ng you will understand everything

– DuttaA – 2018-09-02T16:28:21.300

Answers

1

(1) $X_t$ and $H_{t-1}$ are concatenated. The blog you cited explained its notation "Lines merging denote concatenation". For example, if $X_t=[1,2,3]$ and $H_{t-1}=[4,5,6,7]$, then their concatenation is $[1,2,3,4,5,6,7]$

(2) When you say "input weights" or "weights of the input of the previous time step", are you referring to the $W_i$ in your cited blog? If so they are not the weights of the connections between the time-steps/units. They are part of the input gate only. The connections between the time-steps/units do not have weights applied to them.

user12075

Posted 2018-09-02T08:24:21.083

Reputation: 310

Thanks for the answer. But: If on every time step, more elements are appended, doesn't the vector get huge after many time steps? I mean, there is the forget, ignore and update gate, but as far as I know, they only set the values they want to block to zero, they don't erase the value from the vector. – Ben – 2018-09-03T10:26:12.270

So each gate is essentially a small NN, just collapsed down into one unit using matrix multiplication? – Ben – 2018-09-03T17:39:13.197

@Ben In each step $X_t$ and $H_{t-1}$ is fixed size, so it won't become longer and longer. Note that in each cell, the concatenation of $H_{t-1}$ and $X_t$ are not directly connected to $H_{t}$, but they go through a $\sigma$ which "collapse down" the size. And yes, here $\sigma$ is small NN. – user12075 – 2018-09-03T18:20:28.980