2

UPDATE: The tables look messed up so i put them on pastebin for better visibility. https://pastebin.com/gDX28uVF

I am using Neural Network with different learning types (for example Standard Backpropagation) to classify trends in time series. As stated in several papers, data normalization is a very important factor for successful / efficient learning. I am trying to be clear and precise as possible in the description.

**Problem / Learning Goal:**

The network gets trained with time series and 2 indicators to predict a specific cluster. Here is a very simple (madeup) example of raw data to understand the problem:

**Example RAW Data**
Timestamp;DensityX;WaveLengthY;Temperature (K)

1;0.1;2;200

2;0.9;3;150

3;-0.5;1;175

4;0;6;154

5;1;8;155

6;1.3;1.5;220

7;-0.5;3.4;250

8;0.2;2;255

9;0.1;1;180

see https://pastebin.com/gDX28uVF for better visual

I use the following process to generate suitable sample data for training:

The neural network receives n time slices with the indicators and tries to check if a future trend in the temperature occurs (for x future time slices). For example n = 2; x=3.

The input and output are defined as follows:

**Input vector:**

- In1 = Density_(t-2)
- In2 = Wavelength_(t-2)
- In3 = Density_(t-1)
- In4 = Wavelength_(t-1)

**Output vector:**

Output Vector is a classification encoded by Effects Encoding or Dummy Encoding (Details in “Neural Networks using C# Succinctly”)

Calculation:

- Classification “Down” : Temperature drops 3 times in a row (Encoded as 0;1)
- Classification: “Stable”: Temperature does neither drop nor raises 3 times in a row (1;0)
- Classification: “Up”: Temperature raises 3 times in a row. (-1;-1;)

So the “processed” training sample would look like this:

**Processed Data**

Pattern;I1;I2;I3;I4;O1;O2;Class;Used TS

1;0.1;2;0.9;3;0;1;Down;1 to 5

2;0.9;3;-0.5;1;-1;-1;Up;2 to 6

3;-0.5;1;0;6;-1;-1;Up;3 to 7

4;0;6;1;8;1;0;Stable;4 to 8

5;1;8;1.3;1.5;1;0;Stable;5 to 9

see https://pastebin.com/gDX28uVF for better visual

As you can see due to the different indicators ranges I want to normalize the data.

Basically I found the following propositions in literature and research:

**Min/Max Normalization**

Requires the following values to calculate - dataHigh: The highest unnormalized observation.

dataLow: The lowest unnormalized observation.

normalizedHigh: The high end of the range to which the data will be normalized.

normalizedLow: The low end of the range to which the data will be normalized.

**Reciprocal normalization**

Every value is processed to its reciprocal (x=1/x). Calculated values for density x would be:

Timestamp;Reciprocal Density

1;10

2;1.111111111

3;-2

4;#DIV/0!

5;1

6;0.769230769

7;-2

8;5

9;10

see https://pastebin.com/gDX28uVF for better visual

**Percentage normalization**

The percentual delta is calculated using the value from the previous time stamp.

The starting point was Timestamp 1 where the Delta equals 0. For each timestamp the delta percentage is calculated evaluating the previous value. So calculating the time series delta percentages would turn out to:

Timestamp;"Delta Density X"

1;0

2;0.9

3;-0.555555556

4;0

5;#DIV/0!

6;1.3

7;-0.384615385

8;-0.4

9;0.5

see https://pastebin.com/gDX28uVF for better visual

**As you can see there are errors with handling zero values and the range is still a problem in my opinion. The Min/Max approach is generally leads to a good normalization but I think there is a problem as well, because live data may breach the max and min values of the training set.**

My questions are:

- What are your thoughts about the general idea how I process the raw data?
How would you normalize the given data – if at all?

a) Does it make sense for MinMax Normalization to propose a min max value which will include live data (And throw some error in case it happens)

b) How to handle 0 values (maybe convert it to a small positive or negative value?

Are there other ideas or concepts to conduct this problem?

I am looking forward to your input. Everything is appreciated. Thanks in advance! I also apologize for errors in the example values. Anyways, thanks for your time.

Cheers, hob.

Can you please correct up your errors in the question body? – quintumnia – 2018-03-16T18:33:32.797

1I updated to csv-semic-sep. – hobohak – 2018-03-16T19:46:49.140

Is this an astronomy or quantum phy problem? As far as I know astronomy and quantum phy are not as flexible as engineering and even minute errors are unacceptable, so I think you'll have to increase precision to a very very high value – DuttaA – 2018-03-17T09:01:20.240

1Thank you for the input DuttA. It is none of the problems you have mentioned. What do you mean by increasing the precision in regards of the questions?\ – hobohak – 2018-03-17T11:13:29.690