17

7

Is there a method to calculate the prediction interval (probability distribution) around a time series forecast from an LSTM (or other recurrent) neural network?

Say, for example, I am predicting 10 samples into the future (t+1 to t+10), based on the last 10 observed samples (t-9 to t), I would expect the prediction at t+1 to be more accurate than the prediction at t+10. Typically, one might draw error bars around the prediction to show the interval. With an ARIMA model (under the assumption of normally distributed errors), I can calculate a prediction interval (e.g. 95%) around each predicted value. Can I calculate the same, (or something that relates to the prediction interval) from an LSTM model?

I'm been working with LSTMs in Keras/Python, following lots of examples from machinelearningmastery.com, from which my example code (below) is based on. I'm considering reframing the problem as classification into discrete bins, as that produces a confidence per class, but that seems a poor solution.

There are a couple of similar topics (such as the below), but nothing seems to directly address the issue of prediction intervals from LSTM (or indeed other) neural networks:

Time series prediction using ARIMA vs LSTM

```
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sin
from matplotlib import pyplot
import numpy as np
# Build an LSTM network and train
def fit_lstm(X, y, batch_size, nb_epoch, neurons):
X = X.reshape(X.shape[0], 1, X.shape[1]) # add in another dimension to the X data
y = y.reshape(y.shape[0], y.shape[1]) # but don't add it to the y, as Dense has to be 1d?
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1]))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False)
model.reset_states()
return model
# Configuration
n = 5000 # total size of dataset
SLIDING_WINDOW_LENGTH = 30
SLIDING_WINDOW_STEP_SIZE = 1
batch_size = 10
test_size = 0.1 # fraction of dataset to hold back for testing
nb_epochs = 100 # for training
neurons = 8 # LSTM layer complexity
# create dataset
#raw_values = [sin(i/2) for i in range(n)] # simple sine wave
raw_values = [sin(i/2)+sin(i/6)+sin(i/36)+np.random.uniform(-1,1) for i in range(n)] # double sine with noise
#raw_values = [(i%4) for i in range(n)] # saw tooth
all_data = np.array(raw_values).reshape(-1,1) # make into array, add anothe dimension for sci-kit compatibility
# data is segmented using a sliding window mechanism
all_data_windowed = [np.transpose(all_data[idx:idx+SLIDING_WINDOW_LENGTH]) for idx in np.arange(0,len(all_data)-SLIDING_WINDOW_LENGTH, SLIDING_WINDOW_STEP_SIZE)]
all_data_windowed = np.concatenate(all_data_windowed, axis=0).astype(np.float32)
# split data into train and test-sets
# round datasets down to a multiple of the batch size
test_length = int(round((len(all_data_windowed) * test_size) / batch_size) * batch_size)
train, test = all_data_windowed[:-test_length,:], all_data_windowed[-test_length:,:]
train_length = int(np.floor(train.shape[0] / batch_size)*batch_size)
train = train[:train_length,...]
half_size = int(SLIDING_WINDOW_LENGTH/2) # split the examples half-half, to forecast the second half
X_train, y_train = train[:,:half_size], train[:,half_size:]
X_test, y_test = test[:,:half_size], test[:,half_size:]
# fit the model
lstm_model = fit_lstm(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epochs, neurons=neurons)
# forecast the entire training dataset to build up state for forecasting
X_train_reshaped = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
lstm_model.predict(X_train_reshaped, batch_size=batch_size)
# predict from test dataset
X_test_reshaped = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
yhat = lstm_model.predict(X_test_reshaped, batch_size=batch_size)
#%% Plot prediction vs actual
x_axis_input = range(half_size)
x_axis_output = [x_axis_input[-1]] + list(half_size+np.array(range(half_size)))
fig = pyplot.figure()
ax = fig.add_subplot(111)
line1, = ax.plot(x_axis_input,np.zeros_like(x_axis_input), 'r-')
line2, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'o-')
line3, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'g-')
ax.set_xlim(np.min(x_axis_input),np.max(x_axis_output))
ax.set_ylim(-4,4)
pyplot.legend(('Input','Actual','Predicted'),loc='upper left')
pyplot.show()
# update plot in a loop
for idx in range(y_test.shape[0]):
sample_input = X_test[idx]
sample_truth = [sample_input[-1]] + list(y_test[idx]) # join lists
sample_predicted = [sample_input[-1]] + list(yhat[idx])
line1.set_ydata(sample_input)
line2.set_ydata(sample_truth)
line3.set_ydata(sample_predicted)
fig.canvas.draw()
fig.canvas.flush_events()
pyplot.pause(.25)
```

2

Using mixture networks is interesting, I will try to implement that. There is some solid research on using dropout here: https://arxiv.org/abs/1709.01907 and https://arxiv.org/abs/1506.02142

– 4Oh4 – 2017-11-06T16:58:50.810A note for the dropout, you can actually calculate the variance of prediction of monte carlo dropout, and use that as quantification of uncertainty – Charles Chow – 2019-02-05T23:02:17.033

That is true @CharlesChow but that is a poor way to construct a confidence interval in this context. It would be better to sort the values and use quantiles due to the potentially very skewed distribution. – Jan van der Vegt – 2019-02-07T12:31:40.420

Agree @JanvanderVegt , but you still can estimate the statistics of MC dropout without the assumption of output distribution, I mean you can also use percentile or bootstrapping to construct the CI of MC dropout – Charles Chow – 2019-02-07T21:55:30.543