Query regarding neural network model


I used the Neural Network Toolbox in matlab to train my data. I used four training algorithms, Scaled Conjugate Gradient (SCG), Gradient Descent with momentum and adaptive learning back-propagation (GDX), Resilient Back Propagation (RBP) and Broyden-Fletcher-Goldfarb-Shanno quasi-Newton back propagation (BFG). I have fixed the seeds at different points and obtained the accuracy. This is what I get: enter image description here
The first column contains the size of the feature set. I have added features and increased the size of feature set to analyse the performance.

Initially I have ranked the features and then taken the top 8 feature as one set, the top 16 feature as the next set and so on. The first number before the '-' , is the performance of the algorithm on the training set, the second number after the '-' is the accuracy of the testing set. The train and test set has been divided into 60 and 20 respectively. The other 20 is the validation set. The learning algorithms each have been run with the same seed values to fix the accuracy.

I have fixed the seed to obtain each of the results btw, Like I have used rng(1), rng(10), rng(158), rng(250) and averaged the results to obtain one single pair of train-test accuracy, and I have done this for each pairs.

As you can see I am getting the same accuracy for all feature set size for each of the individual training algorithm. The same data shows perturbation in SVM chen I change the set size. What does this mean?


Posted 2016-08-24T07:41:53.840

Reputation: 1 093

It seems highly unlikely that it would come down to the exact same performances for all 5 feature sets, I would think there was an implementation error – Jan van der Vegt – 2016-08-24T07:46:20.077

@JanvanderVegt How is there an error that is what I am unable to understand. And if there is any how do I find it out. I am using the toolbox, it is not my implementation .... – girl101 – 2016-08-24T07:49:15.710

It looks like it doesn't change the features used whatsoever, since there should be some perturbations in your performance due to the noise in your feature space and potential over fitting – Jan van der Vegt – 2016-08-24T07:50:52.427

@JanvanderVegt I have fixed the seed to obtain each of the results btw, Like I have used rng(1), rng(10), rng(158), rng(250) and averaged the results to obtain one single pair of train-test accuracy. Is fixing the seed leading to this? – girl101 – 2016-08-24T07:53:33.563

I don't think the seeds are the problem, it looks like all the runs use feature size 40 (or 8 or whatever), because for different feature sizes you should get different accuracy (albeit small maybe) – Jan van der Vegt – 2016-08-24T08:00:11.783

@JanvanderVegt I will do it again, to see if anything changes or not. IF I do not get any change, how do I know where the error is? – girl101 – 2016-08-24T08:01:51.140

Do some diagnostics, print the feature dimensions in between the runs to make sure it uses only the number of features you intended. Else I don't know, but it seems off to me – Jan van der Vegt – 2016-08-24T08:03:50.073

@JanvanderVegt I did these all, as I myself was surprised to see this, as I was expecting some variations too. But still I will check. The number of feature that I had added were done very carefully. – girl101 – 2016-08-24T08:04:56.467



To debug this case, I suggest you try the following steps:

  1. Reduce the features step-by-step until you end up with using just 1 feature and see whether the accuracy changes or not.
  2. Add a sine-wave and a random noise to the feature set and see whether it effects any of these optimization algorithms.
  3. Re-evaluate how you selected or derived these features, check if these are highly correlated.
  4. Are your classification targets highly imbalanced? If so, then under/over sample them to achieve a more balanced training set. Then check the performance of you algorithm after training over this balanced dataset.

As already highlighted by Jan van der Vegt, its extremely odd that changing the no of features from 8 to 40 has no impact on test set accuracy.

Sandeep S. Sandhu

Posted 2016-08-24T07:41:53.840

Reputation: 2 087