How does C have effects on bias and variance of a Support Vector Machine?


The minimization problem for SVM can be written as- $$\overset{\text{min}}{\theta} C\sum_{i = 1}^{m}{[y^icost_1(\theta^Tx^i) + (1-y^i)cost_0(\theta^Tx^i)]} + \frac12\sum_{j = 1}^n{\theta_j}^2$$

Now, how can the choice of $C$ lead to underfitting or overfitting?

As I understand, parameters are chosen to make $C\sum_{i = 1}^{m}{[y^icost_1(\theta^Tx^i) + (1-y^i)cost_0(\theta^Tx^i)]}$ part $0$. And we concern ourselves with the second part.

And Andrew Ng says that a large $C$ leads to lower bias and higher variance.

How does this happen? What is the intuition behind this?


Posted 2020-08-16T06:58:09.233

Reputation: 130



The C being a regularized parameter, controls how much you want to punish your model for each misclassified point for a given curve.

If you put large value to C it will try to reduce errors but at the same time it may happen that it would not perform better on test dataset hence cause overfitting.

To get to know more about effect of C in svm. Refer this.


Posted 2020-08-16T06:58:09.233

Reputation: 947