We need to implement a time series problem with the LSTM model.
But, while implementing the same, the main challenge I am facing is the feature selection issue. Because our data-set contains 2300 observations and 600 features. And we already know there are so many features which are not at all relevant. But, no one has the domain expertise to confirm which feature is relevant.
I have tried with…
All 600 features, [shape of X become, 2280 X 20 X 600 (if time step is 20)]. The accuracy is very poor 53-55%.
Considered a single column as a predictor (the feature which we want to predict) [shape of X become, 2280 X 20 X 1 (if the time step is 20)]. The accuracy is also poor.
Also, tried with PCA so that all 600 features are reduced to 20 features [approx 98% data was retained], but not seen any improvement.
Finally, I have gone through a very lengthy process.
a. Out of 600 features, each of them is tried out individually with the LSTM algorithm. The feature which was giving the best result is selected.
b. The selected feature is paired with the rest of 599 features individually. Then, I select the best pair that gives the best output.
c. The best pair is further tried with rest 598 features individually and this process continues...
After doing so, as of now, we get 6 features, the accuracy gets improved to 65%.
My question is, is it the only approach that I can do feature selection in time series problems?
In case of any non-time series problem there are so many ways to do the feature selection. What will be the best way to do feature selection in the time series issue?
Let's give another example...
Let's assume we developed the LSTM model with 10 predictors/features. The model is working very fine and we get 80% accuracy.
Now, somehow 100 more features get added which are not relevant for deriving target y.
So, the accuracy will be dropped as the model will focus on 120 predictors/features, where 100 are not relevant.
Then, how will we eliminate the 100 unwanted features?