Long short-term memory networks are fairly complicated and I haven't completely wrapped my head around them.
It seems to me like the big gain in LSTMs for time series forecasting is the lacking necessity for lagged features: it determines on its own which lagged information is actually significant and remembers it for the next tiestep(s).
Should one still create lagged features as inputs for timesteps when training an LSTM? Like the output of the last timestep, or means and medians of the foregoing timesteps, means and medians for a specific class, distances, differences, etc.?