In answering this question one significant distinction to make is whether we are talking about **linear** Support Vector Machines or **non-linear**, that is, kernelized Support Vector Machines.

## Linear SVMs

Linear SVMs are both in theory and practice very good models when your data can be explained by linear relations of your features. They are superior over classic methods such as linear (a.k.a. least-squares) regression because they are **robust**, in the sense that small perturbations in the input data do not produce significant changes in the model. This is attained by trying to find the line (hyperplane) that maximizes the margins between your data points. This maximum margin hyperplane has been shown to give guarantees on the generalization ability of the model over unseen data points, a theoretical property other machine learning methods lack of.

Linear SVMs are also interpretable as any other linear model, since each input feature has a weight that directly influences the model output.

Also linear SVMs are **very fast** to train, showing sublineal training times for very large datasets. This is achieved by making use of stochastic gradient descent techniques, much in the fashion of current deep learning methods.

## Non-linear SVMs

Non-linear SVMs are still linear models, and boast the same theoretical benefits, but they employ the so called **kernel trick** to build this linear model over an enlarged space. The visible result is that the resultant model can make non-linear decisions on your data. Since you can provide a **custom kernel encoding similarities** between data points, you can make use of problem knowledge to make such kernel focus in the relevant parts of your problem. Doing this effectively, however, can be difficult, so in general almost everybody uses the plug-and-play gaussian kernel.

Non-linear SVMs are partially interpretable, as they tell you which training data are relevant for prediction, and which aren't. This is not possible for other methods such as Random Forests or Deep Networks.

Unfortunately non-linear SVMs are slow. The state-of-the-art algorithm is **Sequential Minimal Optimization**, which shows quadratic performance, and is widely implemented through the LIBSVM library in a number of machine learning libraries, scikit-learn included.

## Popularity of these methods

It is true that SVMs are not so popular as they used to be: this can be checked by googling for research papers or implementations for SVMs vs Random Forests or Deep Learning methods. Still, they are useful in some practical settings, specially in the linear case.

Also, bear in mind that due to the **no-free lunch theorem** no machine learning method can be shown to be superior to any other over all problems. While some methods do work better in general, you will always find datasets where a not so common method will achieve better results.

1

See also http://stats.stackexchange.com/questions/tagged/svm

– StasK – 2014-07-10T11:55:01.3034I don't get it - isn't this a question that should be posted on CrossValidated? I continue to be confused about what goes where between DataScience and CrossValidated. – fnl – 2015-01-21T13:31:03.130

@fnl: svms have some competition as classifiers from less mathematically "pure" engineered solutions, so I think DataScience is in a better position to make the comparison here. Although I share your confusion! – Neil Slater – 2015-01-21T13:59:40.567