## Discarding correlation among inputs in a neural network

3

I am working on a problem with 4 inputs and 1 continuous output variable. The sum of all values of the 4 input variables is always 1.

a1+a2+a3+a4=1

So, they are correlated.

My question is: should I use all 4 variables for neural network training? Or, should I use any 3 of them to get rid of correlation? Is there any problem if I use all 4?

– TwinPenguins – 2018-05-18T07:18:24.243

So, according to this answer, I can use all 4 inputs without any problem. That is ok. But do I get the same answer if I use only 3 of them? – Saptarshi Roy – 2018-05-18T07:51:14.073

Well, how are you going to choose three of four variables? Something like PCA? If yes do try, but you need some sort of way of selecting the subset of features, or maybe random subsampling of features? I would first take all and closely observe the validation curve and do some regularizations first! – TwinPenguins – 2018-05-18T08:46:08.597

That said, have you tested the data for multicollinearity? Maybe you have but I don't think that a1+a2+a3+a4=1 implies a high degree of correlation.