Should one hot vectors be scaled with numerical attributes



In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through standardization/normalization, or should I scale the one hot vectors along with the numerical attributes?

Suresh Kasipandy

Posted 2018-05-14T17:54:58.557

Reputation: 488



Once converted to numerical form, models don't respond differently to columns of one-hot-encoded than they do to any other numerical data. So there is a clear precedent to normalise the {0,1} values if you are doing it for any reason to prepare other columns.

The effect of doing so will depend on the model class, and type of normalisation you apply, but I have noticed some (small) improvements when scaling to mean 0, std 1 for one-hot-encoded categorical data, when training neural networks.

It may make a difference too for model classes based on distance metrics.

Unfortunately, like most of these kind of choices, often you have to try both approaches and take the one with the best metric.

Neil Slater

Posted 2018-05-14T17:54:58.557

Reputation: 24 613

1The wording was a bit unclear. Are you saying you only normalize one-hot-encoded columns if you've normalized any non-o.h.e. columns? – Info5ek – 2019-02-14T02:21:56.823

2@Info5ek: I am saying that it might be better to normalise one-hot-encoded columns, and if you are already doing it for other columns then you may as well give it a try. There are no fixed rules to this, too much depends on the problem at hand. – Neil Slater – 2019-02-14T07:50:34.873