## Preprocessing and dropout in Autoencoders?

2

1

I am working with autoencoders and have few confusions, I am trying different autoencoders like :

fully_connected autoencoder
convolutional autoencoder
denoising autoencoder


I have two dataset , One is numerical dataset which have float and int values , Second is text dataset which have text and date values :

Numerical dataset looks like:

date ,        id ,             check_in , check_out , coke_per , permanent_values , temp
13/9/2017     142453390001    134.2       43.1        13         87                 21
14/9/2017     142453390005    132.2       46.1        19         32                 41
15/9/2017     142453390002    120.2       42.1        33         99                 54
16/9/2017     142453390004    100.2       41.1        17         39

89


Any my text dataset looks like :

data              text
13/9/2017         i totally understand this conversation about farmer market and the organic products, a nice conversation ’cause prices are cheaper than traditional
14/9/2017         The conversation was really great. But I think I need much more practice. I need to improve my listening a lot. Now I’m very worried because I thought that I’d understand more. Although, I understood but I had to repeat and repeat. See you!!!


So my questions are:

Should i normalize my numerical data values before feeding to any type of autoencoder? if they are int and float values still i have to normalize?

Which activation function should i use in autoencoder? Some article and research paper says , "sigmoid" and some says "relu" ?

Should i use dropout in each layer ? like if my artichare for autoencoder looks like

encoder (1000 --> 500 -- > 256 ----> 128 ) --> decoder (128 --> 256 --> 500--> 784)


something like this?

encoder(dropout(1000,500) --> dropout( 500,256) --> dropout (256,128) )----> decoder(dropout(128,256),dropout(256,500),dropout(500,784))


For text dataset , If i am using word2vec or any embedding to convert text into vector then i would have float values for each word , should i normalize that data too ?

text ( Hello How are you  ) -- > word2vec(text) ----> ([1854.92002 , 54112.89774 ,5432.9923 ,5323.98393])


should i normalize this values or directly use in autoencoder ?

Please ask only one question per post. Asking about normalizing, activation functions, and dropout is probably too much in a single post. You can always post multiple questions. – D.W. – 2018-06-10T20:03:52.550

2

1. You should always normailze your input data, because the NN can learn faster with normalized data
2. You can not generalize this questions, but in my experience, relu is better
3. The use of dropout depends on your application for the model. E.g. image inpainting can be improved by dropout. What do you want to do with the model?

I want to find outliers by this model. – Aaditya ura – 2018-06-10T19:16:00.550

1For outlier detection you should avoid too much generalisation, because you can generalise the outlier as well. I would start with no dropout and then use dropout in one layer and compare the results. You should also try regularizer to prevent over-fitting instead of dropout. – Lau – 2018-06-10T19:45:04.277

3

I'll go through your questions one by one:

Should i normalize my numerical data values before feeding to any type of autoencoder? if they are int and float values still i have to normalize?

This is strongly suggested, for two reasons. First, if different variables are on different scales, weights distributions will be unequal. Larger scales will dominate smaller scales during application of gradient descent, and this would lead to many parameters to be undertrained, leading to sub-optimal results. Second, your layers have activation functions that are meant to "learn" non-linear patterns in your data. All the commonly used activation functions (Sigmoid, Tanh, all the ReLU family, you name it) tend to be non-linear only around zero. Normalizing your data helps Neural Networks to learn most from them.

Which activation function should i use in autoencoder? Some article and research paper says , "sigmoid" and some says "relu" ?

This is more an art than a science, however all the activations form the ReLU family have been proved superior to other counterparts. I'd suggest you to go for some kind of ReLU basically always. Some are more fancier but computationally expensive, usually the rank is: ELU > Leaky ReLU > ReLU

Should i use dropout in each layer ? like if my artichare for autoencoder looks like

Use some amount of dropout, but not too much. Dropout is a regularization technique that helps you preventing overfitting. This technique "turns off" some neurons during training sampled randomly, in order to make all neurons specialize during training (it turns your Neural Network in an ensemble of Neural Networks). However, keep in mind that dropout also represents an information loss! For example, if you set a Dropout() layer with dropout probability of 0.5, you'd loose half of the information at that layer at each iteration! I suggest you to use it, but not at every layer, and be parsimonious with it.

For text dataset , If i am using word2vec or any embedding to convert text into vector then i would have float values for each word , should i normalize that data too ?

No, those are learned automatically by the model, you don't have to worry about their internal values.