What is difference between one hot encoding and leave one out encoding?

14

7

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?

icm

Posted 2016-03-23T03:25:53.170

Reputation: 439

1It's not clear (from just your question) what leave-on-out even is. You should edit this to give a pointer and explain briefly your understanding of the two, and why you think they are the same. – Sean Owen – 2016-03-23T13:31:06.357

Answers

16

They are probably using "leave one out encoding" to refer to Owen Zhang's strategy.

From here

The encoded column is not a conventional dummy variable, but instead is the mean response over all rows for this categorical level, excluding the row itself. This gives you the advantage of having a one-column representation of the categorical while avoiding direct response leakage

This picture expresses the idea well. enter image description here

Dex Groves

Posted 2016-03-23T03:25:53.170

Reputation: 346

Your explanation is better than wacax's in the referred link, thank you – Allan Ruin – 2016-08-12T15:00:14.803

Hi @Dex Groves, so the leave_one_out encoding for the test is always .5? – user7117436 – 2017-03-24T20:29:59.033

3Hi! As seen from the picture, this paticular example relates to classification problem. Does anybody have an experience with LOO encoding within regression problem? The main question is how to aggregate the target variable. I am now making experiments and get huge overfitting with mean(y). – Alexey Trofimov – 2017-06-19T12:49:08.367

1for a clustering (unsupervised) problem, is possible to use this kind of encoding? – enneppi – 2018-09-13T10:26:40.643

@AlexeyTrofimov - try an aggregation with a lower variance. I'd start with different binning (like 1K, 2K, 2M, .. for large y int values, or some rounding to a decimal place for y float values) => mean(bin_f(y)) – mork – 2019-03-18T08:40:46.327

@enneppi - the whole idea is to "tie" your categorical feature to the target "y", which you're missing in your unsupervised ML. You could try "tying" your categorical feature into other X features (a kind of feature engineering) – mork – 2019-03-18T08:46:32.020