## RNN's with multiple features

30

22

I have a bit of self taught knowledge working with Machine Learning algorithms (the basic Random Forest and Linear Regression type stuff). I decided to branch out and begin learning RNN's with Keras. When looking at most of the examples, which usually involve stock predictions, I haven't been able to find any basic examples of multiple features being implemented other than 1 column being the feature date and the other being the output. Is there a key fundamental thing I'm missing or something?

If anyone has an example I would greatly appreciate it.

Thanks!

1Not sure what you meant by "multiple features". If you mean more than one feature having an impact on learning, then you just use a multivariate design matrix. Pls clarify by an example or something. – horaceT – 2017-02-16T20:30:19.847

@horaceT I elaborated multiple features here, a more specific question about how to use RNN for time-series predictions with features containing numeric data and non-numeric data?

– hhh – 2017-08-17T13:25:07.370

29

Recurrent neural networks (RNNs) are designed to learn sequence data. As you guess, they can definitely take multiple features as input! Keras' RNNs take 2D inputs (T, F) of timesteps T and features F (I'm ignoring the batch dimension here).

However, you don't always need or want the intermediate timesteps, t = 1, 2 ... (T - 1). Therefore, Keras flexibly supports both modes. To have it output all T timesteps, pass return_sequences=True to your RNN (e.g., LSTM or GRU) at construction. If you only want the last timestep t = T, then use return_sequences=False (this is the default if you don't pass return_sequences to the constructor).

Below are examples of both of these modes.

## Example 1: Learning the sequence

Here's a quick example of training a LSTM (type of RNN) which keeps the entire sequence around. In this example, each input data point has 2 timesteps, each with 3 features; the output data has 2 timesteps (because return_sequences=True), each with 4 data points (because that is the size I pass to LSTM).

import keras.layers as L
import keras.models as M

import numpy

# The inputs to the model.
# We will create two data points, just for the example.
data_x = numpy.array([
# Datapoint 1
[
# Input features at timestep 1
[1, 2, 3],
# Input features at timestep 2
[4, 5, 6]
],
# Datapoint 2
[
# Features at timestep 1
[7, 8, 9],
# Features at timestep 2
[10, 11, 12]
]
])

# The desired model outputs.
# We will create two data points, just for the example.
data_y = numpy.array([
# Datapoint 1
[
# Target features at timestep 1
[101, 102, 103, 104],
# Target features at timestep 2
[105, 106, 107, 108]
],
# Datapoint 2
[
# Target features at timestep 1
[201, 202, 203, 204],
# Target features at timestep 2
[205, 206, 207, 208]
]
])

# Each input data point has 2 timesteps, each with 3 features.
# So the input shape (excluding batch_size) is (2, 3), which
# matches the shape of each data point in data_x above.
model_input = L.Input(shape=(2, 3))

# This RNN will return timesteps with 4 features each.
# Because return_sequences=True, it will output 2 timesteps, each
# with 4 features. So the output shape (excluding batch size) is
# (2, 4), which matches the shape of each data point in data_y above.
model_output = L.LSTM(4, return_sequences=True)(model_input)

# Create the model.
model = M.Model(input=model_input, output=model_output)

# You need to pick appropriate loss/optimizers for your problem.
# I'm just using these to make the example compile.
model.compile('sgd', 'mean_squared_error')

# Train
model.fit(data_x, data_y)


## Example 2: Learning the last timestep

If, on the other hand, you want to train an LSTM which only outputs the last timestep in the sequence, then you need to set return_sequences=False (or just remove it from the constructor entirely, since False is the default). And then your output data (data_y in the example above) needs to be rearranged, since you only need to supply the last timestep. So in this second example, each input data point still has 2 timesteps, each with 3 features. The output data, however, is just a single vector for each data point, because we have flattened everything down to a single timestep. Each of these output vectors still has 4 features, though (because that is the size I pass to LSTM).

import keras.layers as L
import keras.models as M

import numpy

# The inputs to the model.
# We will create two data points, just for the example.
data_x = numpy.array([
# Datapoint 1
[
# Input features at timestep 1
[1, 2, 3],
# Input features at timestep 2
[4, 5, 6]
],
# Datapoint 2
[
# Features at timestep 1
[7, 8, 9],
# Features at timestep 2
[10, 11, 12]
]
])

# The desired model outputs.
# We will create two data points, just for the example.
data_y = numpy.array([
# Datapoint 1
# Target features at timestep 2
[105, 106, 107, 108],
# Datapoint 2
# Target features at timestep 2
[205, 206, 207, 208]
])

# Each input data point has 2 timesteps, each with 3 features.
# So the input shape (excluding batch_size) is (2, 3), which
# matches the shape of each data point in data_x above.
model_input = L.Input(shape=(2, 3))

# This RNN will return timesteps with 4 features each.
# Because return_sequences=False, it will output 2 timesteps, each
# with 4 features. So the output shape (excluding batch size) is
# (2, 4), which matches the shape of each data point in data_y above.
model_output = L.LSTM(4, return_sequences=False)(model_input)

# Create the model.
model = M.Model(input=model_input, output=model_output)

# You need to pick appropriate loss/optimizers for your problem.
# I'm just using these to make the example compile.
model.compile('sgd', 'mean_squared_error')

# Train
model.fit(data_x, data_y)


Thank you for your great explanation. What is the relationship between datapoint #1 and datapoint #2. For example, in first situation, if you were to remove datapoint 2 and place it under the datapoint 1, so now we have 4 time steps. How would that affect the model as a whole? – Rjay155 – 2017-02-17T22:46:41.523

There is no special relationship between datapoints. A good deep learning training set will have many tens of thousands or even millions of datapoints. One data point = one training sample, that's all. If you were to "merge" datapoints #1 and #2, then data_x would simply contain a single datapoint, and that datapoint would have four timesteps, each of 3 dimensions (and similarly, you would have to merge data_y in the same way). The number of timesteps you use simply depends on what you are trying to model (and how many timesteps are relevant for that process). – Adam Sypniewski – 2017-02-18T02:42:33.940

@Adam Sypniewski I have question about the y. data_y = numpy.array([ # Datapoint 1 # Target features at timestep 2 [[105, 106, 107, 108],[0, 1]], # Datapoint 2 # Target features at timestep 2 [[205, 206, 207, 208],[1, 0]] ]) if one of my y is categorical features. How would I structure this. Thx! – Hua Ye – 2017-05-25T15:40:14.593

2In that case, you should probably feed the output of the RNN into a dense layer, so that each output timestep gets mapped into a one-hot categories. – Adam Sypniewski – 2017-05-25T17:30:58.977

How can you visualise the results here? Some plots would be useful. – hhh – 2017-08-17T12:47:59.073