How to scale data for LSTM autoencoder?


I am working on an LSTM autoencoder in keras. The aim here is to obtain a latent space representation for the time sequences which I intend to use for clustering.

My input sequences (each feature) have very low variance among them. The input before normalization looks something like this:

This is one of the sequences, it has 4 features(the columns) and variable length (in this case 11 the number of rows).

The other sequences range from 11 to 200 in length. The number of features obviously remain constant. After normalization over the entire feature space (normalizing on each feature individually) these subtle differences in input sequences become even smaller. And I think the autoencoder is assuming this is noise and is not learning it (or rather behaving like a denoising autoencoder).

Any thoughts on how I can scale the data better? Should I make any changes to how I am treating the problem statement?


  1. There is no problem with the code as I was able to generate very good latent representation on a toy dataset whose features were more evenly spaced out.

  2. I have tried standardization (z score- subtracting by mean and dividing by standard deviation) but problem still persists.

aditya ramesh

Posted 2018-01-12T02:33:31.240

Reputation: 21

Try to de-mean and set variance to 1. – JahKnows – 2018-01-12T07:37:13.547

2@JahKnows Do you mean making mean 0 and setting variance to 0? If thats the case (z-score) i have already done it. – aditya ramesh – 2018-01-12T17:42:21.067

2*Variance to 1 i mean. – aditya ramesh – 2018-01-13T04:12:09.843

Maybe you can try to raise your features to 10^ in order to exaggerate the differences before demean and set variance to 1 if they are too subtle. – JahKnows – 2018-01-15T04:12:42.170

No answers