3

1

For solving a prediction problem I'm willing to use the **Factorization Machines**, a model that in addition to learning linear weights on features, learn a vector space for each feature to learn pairing interactions between features in this new space.

I was told that performing the hashing trick to convert categorical features to 1-of-k binary features (using sklearn’s **DictVectorizer**, which returns sparse matrix) can destroy feature interaction and I should try regular **one-hot encoding** instead.

Can anyone explain why?

Probably because it can map different features to the same bucket. But usually it's not a big deal – Alexey Grigorev – 2015-12-17T09:21:29.210