3

I am trying to predict the trajectory of an object over time using LSTM. I have three different configurations of training and predicting values in my mind and I would like to know what the best solution to this problem might be (I would also appreciate insights regarding these approaches).

### 1) Many to one (loss is the MSE of a single value)

- The $input$ is a sequence of $n$ values, the output is the prediction of the single value at position $n+1$.
- The loss function is the MSE of the predicted value and its real value (so, corresponding to the value in position $n+1$).
- During the online test, a sequence of $n$ values predict one value ($n+1$), and this value is concatenated to the previous sequence in order to predict the next value ($n+2$) etc.. This way, a whole trajectory of $n + t$ values is calculated.

### 2) Many to one (loss is MSE of multiple values)

- The $input$ is a sequence of $n$ values, the output is the prediction of the single value at position $n+1$.
- To compute the loss function, the same strategy used before for online test is applied. LSTM predicts one value, this value is concatenated and used to predict the successive value $t$ times. The loss is the MSE of all the predicted values in the trajectory and their real values. Backpropagation is only done when the whole trajectory has been predicted.
- Online testing is equal to the previous situation.

### 3) Many to many

- The $input$ is a sequence of $n$ values, the output is the prediction of $m$ consecutive values.
- The loss function is the MSE of the $m$ predictions and their corresponding ground truth.

Thank you for your answer. I am confused by the notation: many to one (single values) and many to one (multiple values). I think what I described in my Example 1) is the Many-to-one (single values) as a (multiple values) version, am I correct? How is the loss computed in that case? – maurock – 2020-03-27T15:55:37.820

This depends from your data mostly. For example, when my data are scaled in the 0-1 interval, I use MAE (Mean Absolute Error). Alternatively, standard MSE works good. If your trends are on very different scales, an alternative could be MAPE (Mean Absolute Percentage Error). All these choices are very task specific though. How is your dataset? – Leevo – 2020-03-27T16:11:37.487

My dataset is composed of n sequences, the input size is e.g. 10 and each element is an array of 4 normalized values, 1 batch: LSTM input shape (10, 1, 4). I thought the loss depends on the version, since in 1 case: MSE is computed on the single consecutive predicted value and then backpropagated. So, the input is composed of elements of the dataset. In the other case, MSE is computed on m consecutive predictions (obtained appending the preceding prediction) and then backpropagated. In this case, the input is composed of predicted values, and not only of data sampled from the dataset. – maurock – 2020-03-27T16:34:41.967