Neural network accuracy in Torch depends on compute power?

1

I am new to machine learning and have quite good understanding of basic concepts.

I was implementing a 3 Layer neural network on MNIST dataset with 784, 100, 10 neurons in input, hidden, output layer respectively. I did not use any regularization here.

Firstly I trained the network on Intel i5 4th generation quad-core CPU with 4GB of ram which gave me 64% of accuracy. Then I trained the exact same network, with exact same code on Intel i5 7th generation quad core CPU with 8gb of ram which gave accuracy of about 89% This is the link to the implementation.

My question is in Torch, does the compute power effect the accuracy of the network? or is it something else that I am missing which has resulted in this huge change.

I did not use any different weight initialization method than the default provided in the torch libraries, so that is ruled out. I also did not use anything else which may effect the network to change its accuracy to this extent.

Anuj

Posted 2016-11-05T17:04:07.897

Reputation: 111

Did you train it for the same number of iterations with the same learning rate? – franciscojavierarceo – 2016-11-05T18:57:03.477

No. Literally each and every detail of network is same, except the machine I trained it on. – Anuj – 2016-11-06T03:07:34.303

Did you figure this out Anuj? where you able to repeat the experiment several times? – Leo Gallucci – 2019-03-04T12:39:04.067

Answers

1

Available compute power does not directly affect the accuracy of a neural network. If your different runs of the network have:

  • identical architecture and meta-params
  • identical code (including library code)
  • all training data is identical
  • all stochastic parts of training use the same random seed and generator
  • all data types are identical precision (e.g. all vectors and matrices are 32-bit or 64-bit floats)

then the behaviour of neural network training in each run is fully deterministic and repeatable. Having a faster processor will just get you to the result faster*.

The most likely difference between your tests is due to not seeding the random number generators used in the training the process. For you this includes weight initialisation, possibly train/test split and possibly shuffling training data in each epoch. As you did not use any regularisation, then accuracy of the trained network can vary quite a bit due to over-fitting.

To verify this, you can train a second or third time on each CPU. I expect you will see a lot of variation in final accuracy, regardless of which machine you run it on.


* This does mean that having a faster machine can result in you having a more accurate final network in practice when you are tuning the parameters, because you can try more variations of meta-params with multiple training sessions.

Neil Slater

Posted 2016-11-05T17:04:07.897

Reputation: 24 613

Yes this is right. But In my case i did not change anything not the learning rate, not the regularization parameter or any such thing. All i did was i used a different machine to train. and the accuracy skyrocketed by approx 30%. So the point of my question was to make sure that it is something related to the framework that i am using. – Anuj – 2016-11-08T09:04:46.973