Is there an encoder which can automatically detect the intrinsic order of an ordinal variable and assign values accordingly?



Given data with an ordinal variable, says "house quality" with values ex (excellent), gd (good), fa (fair) and bd (bad), we obviously cannot just throw data into sklearn's LabelEncoder as the resulting labels can be in wrong order, e.g. {bd: 3, gd: 2, fa: 1, ex:0}. Instead, we need to manually specify an order, right? However, if we do not have domain knowledge, how can we specify the order? Also a manual way is usually prone to error. Thus, I am curious if there is any encoder which can auto detect the correct order in an ordinal variable?

Victor Luu

Posted 2020-07-15T22:51:05.527

Reputation: 223

1For me it would beg the question, how without domain knowledge one would even know whether a categorical variable is ordinal or nominal. – Fnguyen – 2020-07-16T07:46:45.447

Point taken @Fnguyen . Let me re-frame my question, i) we know the variable is ordinal e.g. based on its name, ii) we may have partial or full domain knowledge, but we want to avoid the manual process of providing the order, especially when we have lots of categories, says 10-100. – Victor Luu – 2020-07-16T18:09:57.460



Yup, it should be target encoding. By calculating the target mean for each category, you are ordering the categories based on the target, I don't think of a best way to order them. Of course, this is only valid in a supervised learning setting. If not, I can't think of a way of automatically ordering categories.

See this question and this blogpost to dig deeper in target encoding.

David Masip

Posted 2020-07-15T22:51:05.527

Reputation: 5 101

Thanks @David for a detailed answer, nice blog post. Actually I kind of knowing that target encoding is a solution. Just wonder if there is any other encoding scheme. – Victor Luu – 2020-07-16T18:14:31.450