Is there a method to calculate the prediction interval (probability distribution) around a time series forecast from an LSTM (or other recurrent) neural network?
Say, for example, I am predicting 10 samples into the future (t+1 to t+10), based on the last 10 observed samples (t-9 to t), I would expect the prediction at t+1 to be more accurate than the prediction at t+10. Typically, one might draw error bars around the prediction to show the interval. With an ARIMA model (under the assumption of normally distributed errors), I can calculate a prediction interval (e.g. 95%) around each predicted value. Can I calculate the same, (or something that relates to the prediction interval) from an LSTM model?
I'm been working with LSTMs in Keras/Python, following lots of examples from machinelearningmastery.com, from which my example code (below) is based on. I'm considering reframing the problem as classification into discrete bins, as that produces a confidence per class, but that seems a poor solution.
There are a couple of similar topics (such as the below), but nothing seems to directly address the issue of prediction intervals from LSTM (or indeed other) neural networks:
from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from math import sin from matplotlib import pyplot import numpy as np # Build an LSTM network and train def fit_lstm(X, y, batch_size, nb_epoch, neurons): X = X.reshape(X.shape, 1, X.shape) # add in another dimension to the X data y = y.reshape(y.shape, y.shape) # but don't add it to the y, as Dense has to be 1d? model = Sequential() model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape, X.shape), stateful=True)) model.add(Dense(y.shape)) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(nb_epoch): model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False) model.reset_states() return model # Configuration n = 5000 # total size of dataset SLIDING_WINDOW_LENGTH = 30 SLIDING_WINDOW_STEP_SIZE = 1 batch_size = 10 test_size = 0.1 # fraction of dataset to hold back for testing nb_epochs = 100 # for training neurons = 8 # LSTM layer complexity # create dataset #raw_values = [sin(i/2) for i in range(n)] # simple sine wave raw_values = [sin(i/2)+sin(i/6)+sin(i/36)+np.random.uniform(-1,1) for i in range(n)] # double sine with noise #raw_values = [(i%4) for i in range(n)] # saw tooth all_data = np.array(raw_values).reshape(-1,1) # make into array, add anothe dimension for sci-kit compatibility # data is segmented using a sliding window mechanism all_data_windowed = [np.transpose(all_data[idx:idx+SLIDING_WINDOW_LENGTH]) for idx in np.arange(0,len(all_data)-SLIDING_WINDOW_LENGTH, SLIDING_WINDOW_STEP_SIZE)] all_data_windowed = np.concatenate(all_data_windowed, axis=0).astype(np.float32) # split data into train and test-sets # round datasets down to a multiple of the batch size test_length = int(round((len(all_data_windowed) * test_size) / batch_size) * batch_size) train, test = all_data_windowed[:-test_length,:], all_data_windowed[-test_length:,:] train_length = int(np.floor(train.shape / batch_size)*batch_size) train = train[:train_length,...] half_size = int(SLIDING_WINDOW_LENGTH/2) # split the examples half-half, to forecast the second half X_train, y_train = train[:,:half_size], train[:,half_size:] X_test, y_test = test[:,:half_size], test[:,half_size:] # fit the model lstm_model = fit_lstm(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epochs, neurons=neurons) # forecast the entire training dataset to build up state for forecasting X_train_reshaped = X_train.reshape(X_train.shape, 1, X_train.shape) lstm_model.predict(X_train_reshaped, batch_size=batch_size) # predict from test dataset X_test_reshaped = X_test.reshape(X_test.shape, 1, X_test.shape) yhat = lstm_model.predict(X_test_reshaped, batch_size=batch_size) #%% Plot prediction vs actual x_axis_input = range(half_size) x_axis_output = [x_axis_input[-1]] + list(half_size+np.array(range(half_size))) fig = pyplot.figure() ax = fig.add_subplot(111) line1, = ax.plot(x_axis_input,np.zeros_like(x_axis_input), 'r-') line2, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'o-') line3, = ax.plot(x_axis_output,np.zeros_like(x_axis_output), 'g-') ax.set_xlim(np.min(x_axis_input),np.max(x_axis_output)) ax.set_ylim(-4,4) pyplot.legend(('Input','Actual','Predicted'),loc='upper left') pyplot.show() # update plot in a loop for idx in range(y_test.shape): sample_input = X_test[idx] sample_truth = [sample_input[-1]] + list(y_test[idx]) # join lists sample_predicted = [sample_input[-1]] + list(yhat[idx]) line1.set_ydata(sample_input) line2.set_ydata(sample_truth) line3.set_ydata(sample_predicted) fig.canvas.draw() fig.canvas.flush_events() pyplot.pause(.25)