Feature selection for time series prediction



I'm working on an LSTM-based stock market forecasting problem and trying to figure out a way to select input variables.

  1. When calculating correlation between variables (e.g. Close price of Tesla vs Close price of Microsoft), would differentiating the curves give a more accurate (or correct) correlation index ? I'm finding values in the range 0.7-0.9 for non-differentiated variables, and lower values after differentiation.

  2. Once I have a correlation matrix of all my variables, is there a way to figure out which ones would add information to the neural net and which ones would just add noise ?

Leandro Ercoli

Posted 2018-08-12T14:07:14.120

Reputation: 21



You don’t need to select variables for feeding to network, deep neural networks (DNN) will do this automatically. Actually DNN gives more importance to relevant variables by setting its weights. After setting the weights, some of the hidden nodes take 0 and some of them take 1 (because of sigmoid function). You can think of this 1 and 0’s as choosing relevant variables, too.

By the way, correlation matrix can not be used to select relevant variables directly. If you want to reduce the number of variables that are fed to DNN, you can use PCA. Actually PCA components are calculated by getting the Eigen-vectors of correlation matrix.


Posted 2018-08-12T14:07:14.120

Reputation: 1 162