Is there an asymmetric version of nominal correlation?

4

1

I use Cramer's V to calculate correlation of features in a dataset made of only nominal features.

Let's consider the following dataset:

a  |  b
--------
0  |  0
0  |  1
0  |  0
1  |  2
1  |  2
1  |  3

Calculating Cramer's V for features a and b yields 0.707. Since it's symmetric, there's information loss in this case - as we can see, knowing the value of b means we know for sure what is the value of a, but this is no the case if we are given the value of a; in this case, the number possible values of b decreases, but it's still not known for sure.

I'd like to find an asymmetric metric that will provide this information for nominal values - meaning, will give a different value when calculated a -> b and b -> a. Is there anything like this?

shakedzy

Posted 2018-01-09T15:06:46.763

Reputation: 639

Answers

8

I found what I was looking for - it's called Theil's U, or the Uncertainty Coefficient.

I've used it in this Kaggle kernel, you can check it out for an example and code implementation in Python

EDIT: I also have a blogpost about it.

shakedzy

Posted 2018-01-09T15:06:46.763

Reputation: 639