3

I am working on a problem with 4 inputs and 1 continuous output variable. The sum of all values of the 4 input variables is always 1.

a1+a2+a3+a4=1

So, they are correlated.

My question is: should I use all 4 variables for neural network training? Or, should I use any 3 of them to get rid of correlation? Is there any problem if I use all 4?

How about this answer at stats.stackexchange? https://stats.stackexchange.com/questions/232534/does-correlated-input-data-lead-to-overfitting-with-neural-networks

– TwinPenguins – 2018-05-18T07:18:24.243So, according to this answer, I can use all 4 inputs without any problem. That is ok. But do I get the same answer if I use only 3 of them? – Saptarshi Roy – 2018-05-18T07:51:14.073

Well, how are you going to choose three of four variables? Something like PCA? If yes do try, but you need some sort of way of selecting the subset of features, or maybe random subsampling of features? I would first take all and closely observe the validation curve and do some regularizations first! – TwinPenguins – 2018-05-18T08:46:08.597