11

3

I'm new to ML and TensorFlow (I started about a few hours ago), and I'm trying to use it to predict the next few data points in a time series. I'm taking my input and doing this with it:

```
/----------- x ------------\
.-------------------------------.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
'-------------------------------'
\----------- y ------------/
```

What I thought I was doing is using *x* as the input data and *y* as the desired output for that input, so that given 0-6 I could get 1-7 (the 7 in particular). However, when I run my graph with *x* as the input, what I get is a prediction that looks more like *x* than *y*.

Here's the code (based on this post and this post):

```
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plot
import pandas as pd
import csv
def load_data_points(filename):
print("Opening CSV file")
with open(filename) as csvfile:
print("Creating CSV reader")
reader = csv.reader(csvfile)
print("Reading CSV")
return [[[float(p)] for p in row] for row in reader]
flatten = lambda l: [item for sublist in l for item in sublist]
data_points = load_data_points('dataset.csv')
print("Loaded")
prediction_size = 10
num_test_rows = 1
num_data_rows = len(data_points) - num_test_rows
row_size = len(data_points[0]) - prediction_size
# Training data
data_rows = data_points[:-num_test_rows]
x_data_points = np.array([row[:-prediction_size] for row in data_rows]).reshape([-1, row_size, 1])
y_data_points = np.array([row[prediction_size:] for row in data_rows]).reshape([-1, row_size, 1])
# Test data
test_rows = data_points[-num_test_rows:]
x_test_points = np.array([[data_points[0][:-prediction_size]]]).reshape([-1, row_size, 1])
y_test_points = np.array([[data_points[0][prediction_size:]]]).reshape([-1, row_size, 1])
tf.reset_default_graph()
num_hidden = 100
x = tf.placeholder(tf.float32, [None, row_size, 1])
y = tf.placeholder(tf.float32, [None, row_size, 1])
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=num_hidden, activation=tf.nn.relu)
rnn_outputs, _ = tf.nn.dynamic_rnn(basic_cell, x, dtype=tf.float32)
learning_rate = 0.001
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, num_hidden])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, 1)
outputs = tf.reshape(stacked_outputs, [-1, row_size, 1])
loss = tf.reduce_sum(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
iterations = 1000
with tf.Session() as sess:
init.run()
for ep in range(iterations):
sess.run(training_op, feed_dict={x: x_data_points, y: y_data_points})
if ep % 100 == 0:
mse = loss.eval(feed_dict={x: x_data_points, y: y_data_points})
print(ep, "\tMSE:", mse)
y_pred = sess.run(stacked_outputs, feed_dict={x: x_test_points})
plot.rcParams["figure.figsize"] = (20, 10)
plot.title("Actual vs Predicted")
plot.plot(pd.Series(np.ravel(x_test_points)), 'g:', markersize=2, label="X")
plot.plot(pd.Series(np.ravel(y_test_points)), 'b--', markersize=2, label="Y")
plot.plot(pd.Series(np.ravel(y_pred)), 'r-', markersize=2, label="Predicted")
plot.legend(loc='upper left')
plot.xlabel("Time periods")
plot.tick_params(
axis='y',
which='both',
left='off',
right='off',
labelleft='off')
plot.show()
```

The result shown in the graph below is a prediction that follows *x*, rather than being shifted to the left (and including the predicted points on the right) as it should be to resemble *y*. Obviously the desire is for the red line to be as close to the blue one as possible.

I have no idea what I'm doing with all this, so please ELI5.

Oh, also, my data points are fairly small numbers (order of 0.0001). If I don't multiply them by, say, 1000000, the results are so small that the red line is almost flat at the bottom of the chart. Why? I'm guessing it's because of the squaring in the fitness function. Should data be normalized before use, and if so, to what? 0-1? If I use:

```
normalized_points = [(p - min_point) / (max_point - min_point) for p in data_points]
```

my prediction fluctuates more wildly as it progresses:

**Edit:** I'm being dumb and only giving it one example to learn from, not 500, aren't I? So I should be giving it multiple 500-point samples, right?

I have the same problem - namely that the output of the RNN follows the input (X) and not the target (Y). Strangely when the input to the same RNN is a simple sine series it learns correctly, i.e. predicts the Y. – Ryszard Cetnarski – 2018-06-15T13:38:46.157

Please share your dataset.csv file – Ashwin Tomar – 2018-07-18T10:35:57.983