Clarification about Octave data size limit

3

1

I'm just starting to work on a relatively large dataset after ML course in Coursera. Trying to work on https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD. Got an accuracy of 5.2 in training and test set with linear regression using gradient descent in octave.

I tried adding all possible quadratic features (515345 instances and 4275 features), but the code just won't stop executing in my HP Pavilion g6 2320tx, with 4GB RAM in Ubuntu 14.04.

Is this beyond the data size capacity of Octave ?

abhivij

Posted 2014-11-07T04:13:44.583

Reputation: 69

How many columns did you get after including the quadratic features? Was it 180? Also, what was the maximum memory usage without including the quadratic term? – Nitesh – 2014-11-07T04:52:37.700

I'm trying to add all possible quadratic features. That should give about 4275 features I think. Adding just the squared features works fine. That would be 180. btw how do we check the max memory usage ? – abhivij – 2014-11-07T05:08:12.373

see http://ubuntuforums.org/showthread.php?t=1161120

– Nitesh – 2014-11-07T05:55:48.770

Answers

2

You have about 4GB of RAM on your machine and Octave is an in memory application.

If you want to work with 515345 instances and 4275 features, assuming that you are using double precision (i.e. 8 bytes), you would need a memory of 515345*4275*8/1000000/1024 bytes ~ 17.6 GB. Even if you were using 4 bytes for each data point, you would require at least 9 GB for the computation to go through.

This issue might not be the Octave memory restriction in this case. See here for further details on Octave's memory usage.

Nitesh

Posted 2014-11-07T04:13:44.583

Reputation: 1 515

Thank you. So adding all quadratic features is not possible. Guess I'll have to try some other methods – abhivij – 2014-11-07T09:36:33.640

Yes, that would make more sense. I think 4275 is too many features, given the number of rows you are dealing with. – Nitesh – 2014-11-08T00:43:08.737