Handling categorical features in Factorization Machines algorithm - Feature Hashing vs. One-Hot encoding



For solving a prediction problem I'm willing to use the Factorization Machines, a model that in addition to learning linear weights on features, learn a vector space for each feature to learn pairing interactions between features in this new space.

I was told that performing the hashing trick to convert categorical features to 1-of-k binary features (using sklearn’s DictVectorizer, which returns sparse matrix) can destroy feature interaction and I should try regular one-hot encoding instead.

Can anyone explain why?


Posted 2015-12-15T10:12:37.867

Reputation: 194

Probably because it can map different features to the same bucket. But usually it's not a big deal – Alexey Grigorev – 2015-12-17T09:21:29.210



I decided to expand a bit on my comment and make a full answer.

So the reason why somebody may say that performing the hashing trick can destroy interactions is because it may map different features to the same bucket. But usually it's not a big deal.

Note that DictVectorizer doesn't perform the hashing trick:

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

To do it, you need to use a different vectorizer: HashingVectorizer

Alexey Grigorev

Posted 2015-12-15T10:12:37.867

Reputation: 2 460