SVM using scikit learn runs endlessly and never completes execution



I am trying to run SVR using scikit-learn (python) on a training dataset that has 595605 rows and 5 columns (features) while the test dataset has 397070 rows. The data has been pre-processed and regularized.

I am able to successfully run the test examples, but on executing using my dataset and letting it run for over an hour, I could still not see any output or termination of the program. I tried executing using a different IDE and even from the terminal, but that does not seem to be the issue. I also tried changing the 'C' parameter value from 1 to 1e3.

I am facing similar issues with all SVM implementations using scikit.

Am I not waiting long enough for it to complete? How much time should this execution take?

From my experience, it should not require more than a few minutes.

Here is my system configuration: Ubuntu 14.04, 8GB RAM, lots of free memory, 4th gen i7 processor


Posted 2014-08-18T10:46:57.360

Reputation: 3 365

7sklearn's SVM implementation implies at least 3 steps: 1) creating SVR object, 2) fitting a model, 3) predicting value. First step describes kernel in use, which helps to understand inner processes much better. Second and third steps are pretty different, and we need to know at least which of them takes that long. If it is training, then it may be ok, because learning is slow sometimes. If it is testing, then there's probably a bug, because testing in SVM is really fast. In addition, it may be CSV reading that takes that long and not SVM at all. So all these details may be important. – ffriend – 2014-08-18T13:22:28.580

i am facing the same problem as well through svm but can anyone tell me how much time will it take after normalization ? – kashyap kitchlu – 2018-07-06T15:19:58.397

@kashyapkitchlu, Please, try small portion of your dataset. You may see the processing time difference. One thing I suggest are your small dataset need to be similar bias to end the SVM training around similar iteration count with original dataset. – Cloud Cho – 2020-03-26T15:37:57.427



Kernelized SVMs require the computation of a distance function between each point in the dataset, which is the dominating cost of $\mathcal{O}(n_\text{features} \times n_\text{observations}^2)$. The storage of the distances is a burden on memory, so they're recomputed on the fly. Thankfully, only the points nearest the decision boundary are needed most of the time. Frequently computed distances are stored in a cache. If the cache is getting thrashed then the running time blows up to $\mathcal{O}(n_\text{features} \times n_\text{observations}^3)$.

You can increase this cache by invoking SVR as

model = SVR(cache_size=7000)

In general, this is not going to work. But all is not lost. You can subsample the data and use the rest as a validation set, or you can pick a different model. Above the 200,000 observation range, it's wise to choose linear learners.

Kernel SVM can be approximated, by approximating the kernel matrix and feeding it to a linear SVM. This allows you to trade off between accuracy and performance in linear time.

A popular means of achieving this is to use 100 or so cluster centers found by kmeans/kmeans++ as the basis of your kernel function. The new derived features are then fed into a linear model. This works very well in practice. Tools like sophia-ml and vowpal wabbit are how Google, Yahoo and Microsoft do this. Input/output becomes the dominating cost for simple linear learners.

In the abundance of data, nonparametric models perform roughly the same for most problems. The exceptions being structured inputs, like text, images, time series, audio.

Further reading

Jessica Collins

Posted 2014-08-18T10:46:57.360

Reputation: 981


SVM solves an optimization problem of quadratic order.

I do not have anything to add that has not been said here. I just want to post a link the sklearn page about SVC which clarifies what is going on:

The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

If you do not want to use kernels, and a linear SVM suffices, there is LinearSVR which is much faster because it uses an optimization approach ala linear regressions. You'll have to normalize your data though, in case you're not doing so already, because it applies regularization to the intercept coefficient, which is not probably what you want. It means if your data average is far from zero, it will not be able to solve it satisfactorily.

What you can also use is stochastic gradient descent to solve the optimization problem. Sklearn features SGDRegressor. You have to use loss='epsilon_insensitive' to have similar results to linear SVM. See the documentation. I would only use gradient descent as a last resort though because it implies much tweaking of the hyperparameters in order to avoid getting stuck in local minima. Use LinearSVR if you can.

Ricardo Cruz

Posted 2014-08-18T10:46:57.360

Reputation: 3 052

I had a dataset with many lines. SVC started taking way too long for me about about 150K rows of data. I used your suggestion with LinearSVR and a million rows takes only a couple minutes. PS also found LogisticRegression classifier produces similar results as LinearSVR ( in my case ) and is even faster. – jeffery_the_wind – 2017-05-01T07:50:24.180


Did you include scaling in your pre-processing step? I had this issue when running my SVM. My dataset is ~780,000 samples (row) with 20 features (col). My training set is ~235k samples. It turns out that I just forgot to scale my data! If this is the case, try adding this bit to your code:

scale data to [-1,1] ; increase SVM speed:

from sklearn.preprocessing import MinMaxScaler
scaling = MinMaxScaler(feature_range=(-1,1)).fit(X_train)
X_train = scaling.transform(X_train)
X_test = scaling.transform(X_test)

Shelby Matlock

Posted 2014-08-18T10:46:57.360

Reputation: 101

3Can anyone explain why this speeds up the SVM fit? – lppier – 2018-05-10T00:25:52.837

1Is there a reason why you picked MinMaxScaler instead of any other? StandardScaler for instance? – raspi – 2018-11-28T15:43:39.940

@Ippier: essentially you're reducing the possible boundary space for each option in a way that makes the level of effort much less for your machine. – ike – 2019-09-20T21:00:19.930

In that case is there any reason not to shrink it even more? – Benitok – 2021-02-15T23:07:38.300


With such a huge dataset I think you'd be better off using a neural network, deep learning, random forest (they are surprisingly good), etc.

As mentioned in earlier replies, the time taken is proportional to the third power of the number of training samples. Even the prediction time is polynomial in terms of number of test vectors.

If you really must use SVM then I'd recommend using GPU speed up or reducing the training dataset size. Try with a sample (10,000 rows maybe) of the data first to see whether it's not an issue with the data format or distribution.

As mentioned in other replies, linear kernels are faster.

Leela Prabhu

Posted 2014-08-18T10:46:57.360

Reputation: 163


I recently encountered similar problem because forgot to scale features in my dataset which was earlier used to train ensemble model kind. Failure to scale the data may be the likely culprit as pointed by Shelby Matlock. You may try different scalers available in sklearn, such as RobustScaler:

from sklearn.preprocessing import RobustScaler
 scaler = RobustScaler()
 X = scaler.fit_transfrom(X)

X is now transformed/scaled and ready to be fed to your desired model.

Dutse I

Posted 2014-08-18T10:46:57.360

Reputation: 31


Try normalising the data to [-1,1]. I faced a similar problem and upon normalisation everything worked fine. You can normalise data easily using:

from sklearn import preprocessing
X_train = preprocessing.scale(X_train)
X_test = preprocessing.scale(X_test)


Posted 2014-08-18T10:46:57.360

Reputation: 111

@Archie This is an answer to a question, not a question. – timleathart – 2017-11-12T12:02:18.940


This makes sense. IIUC, the speed of execution of support vector operations is bound by number of samples, not dimensionality. In other words, it is capped by CPU time and not RAM. I'm not sure exactly how much time this should take, but I'm running some benchmarks to find out.

Jaidev Deshpande

Posted 2014-08-18T10:46:57.360

Reputation: 139


Leave it to run overnight or better for 24 hours. What is your CPU utilization? If none of the cores is running at 100% then you have a problem. Probably with memory. Have you checked whether your dataset fits into 8GB at all? Have you tried the SGDClassifier? It is one of the fastest there. Worth giving it a try first hoping it completes in an hour or so.


Posted 2014-08-18T10:46:57.360

Reputation: 540

SGDClassifier does not support kernels. If the OP wants linear SVM, then I would recommend first trying LinearSVR. It is much faster than SVR because it solves the problem using a linear regression library, and global minimum is guaranteed (unlike gradient descente). – Ricardo Cruz – 2016-07-08T10:03:30.390

Appreciate your comment. Could you elaborate on why kernel support is an issue? – Diego – 2016-07-09T10:25:00.390

From the documentation, The loss function to be used. Defaults to ‘hinge’, which gives a linear SVM. Same thing for SGDRegressor. SGDRegressor is equivalent to using SVR(kernel='linear'). If that is what OP wants, that's great. I was under the impression he wanted to use SVM with a kernel. If that is not the case, I would recommend he first tries LinearSVR.

– Ricardo Cruz – 2016-07-09T13:37:09.213


I just had a similar issue with a dataset which contains only 115 elements and only one single feature (international airline data). The solution was to scale the data. What I missed in answers so far was the usage of a Pipeline:

from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler

model = Pipeline([('scaler', StandardScaler()),
                  ('svr', SVR(kernel='linear'))])

You can train model like a usual classification / regression model and evaluate it the same way. Nothing changes, only the definition of the model.

Martin Thoma

Posted 2014-08-18T10:46:57.360

Reputation: 15 590


I have encountered this issue and cache_size as others are suggesting does not help at all. You can see this post and this one as the main contributor suggested that you should change the code manually.

As you know, SVC and SVR are optimization problems and they stop when the error margin is so little where the further optimization is futile. So there is another parameter in these, max_iter, where you can set how many iterations it should do.

I have used sklearn in python and e1071 in R and R is much faster getting to the result without setting the max_iter and sklearn takes 2-4 times longer. The only way that I could bring down the computation time for python was using max_iter. It is relative to the complexity of your model, number of features, kernels and hyperparameters, but for small dataset I used for around 4000 datapoint and max_iter was 10000 the results were not different at all and it was acceptable.

Habib Karbasian

Posted 2014-08-18T10:46:57.360

Reputation: 111


You need to scale your data. Scaling will normalize your data points to -1 to 1 range, which will help in faster convergence.

Try using following code:

# X is your numpy data array.

from sklearn import preprocessing

X = preprocessing.scale(X)

Rishabh Gupta

Posted 2014-08-18T10:46:57.360

Reputation: 11

1welcome to Data Science SE! Could you explain how your suggestion will help OP? What you are suggesting is a scaling of a array. It is not clear how that may or may not affect the SVR algorithm in scikit learn. – Stereo – 2017-01-04T14:20:16.720


I also faced a similar problem with SVM training taking infinite time. Now, the problem is resolved by preprocessing the data. Please add the following lines in your code before training:

from sklearn import preprocessing

X_train = preprocessing.scale(X_train)

X_test = preprocessing.scale(X_test)


Posted 2014-08-18T10:46:57.360

Reputation: 1