Free parameters in logistic regression


When applying logistic regression, one is essentially applying the following function $1/(1 + e^{\beta x})$ to provide a decision boundary, where $\beta$ are a set of parameters that are learned by the algorithm, and $x$ is an input feature vector. This appears to be the general framework provided by widely available packages such as Python's sklearn.

This is a very basic question, and can be manually implemented by normalization of the features, but shouldn't a more accurate decision boundary be given by: $1/(1 + e^{\beta (x - \alpha)})$, where $\alpha$ is an offset? Of course an individual can manually subtract a pre-specified $\alpha$ from the features ahead of time and achieve the same result, but wouldn't it be best for the logistic regression algorithm to simply let $\alpha$ be a free parameter that is trained, like $\beta$? Is there a reason this is not routinely done?


Posted 2018-10-13T20:29:31.540

Reputation: 175



You get the same effect from including a bias term, i.e. $\frac{1}{1+\exp(-\beta x + bias)}$, and the defaults for most software is to include such bias.

To see why, you can do the math, expanding $\beta(x-a)$ and seeing that you can express the sum of those differences as a single variable.


Posted 2018-10-13T20:29:31.540

Reputation: 754

Fair enough, but these parameters have physical significance that is then lost, no? When plotting the decision boundary, if there is more than one feature, then the shifts resulting from a particular bias term are not unique. For example, $0.5x_1 + 0.5x_2 + 10$ is equivalent to $0.5(x_1 - 10) + 0.5(x_2 + 30)$ or $0.5(x_1 + 30) + 0.5(x_2 - 10)$. But what if one wants to essentially find the marginalized decision boundary for a single feature and find the shift for only $x_1$, for example? – Mathews24 – 2018-10-13T22:53:15.480

1Well, you’ve just highlighted another problem with adding an offset: there is no unique solution to the maximum likelihood estimate (or loss function if you prefer). It also would not be convex anymore, and therefore hard to optimize.

You also don’t get marginalized decision boundaries from logistic regression (you’d have to know the distributions of all variables), what you can get is conditional decision boundaries though, and in that case, it would be the same when using bias or offsets. – anymous.asker – 2018-10-14T09:42:02.297

If your goal is to obtain sensitivity statistics, you could either do a simulation (that takes into account the distribution of your data), or perhaps center the variables beforehand and take those coefficients as greedy estimates. – anymous.asker – 2018-10-14T09:43:07.107