Using Kendall's Tau for association between dichotomous nominal and ordinal features


I'm working on the titanic data set and I've split my data into 3 groups:

# nominal variables
nom_vars = ['Survived', 'Title', 'Embarked', 'Sex', 'Alone']

# ordinal variables
ord_vars = ['Survived', 'Pclass', 'FamilySize']

# continuous variables
cont_vars = ['Survived', 'Fare', 'Age']

In order to determine association, I used Cramer's V, Kendall's Tau and Pearson's R respectively. From these scores, I want to choose which features to keep/discard.

Now I'm having second thoughts... Each set contains the "Survived" variable. I interpret "Survived" as a nominal and dichotomous. Considering the last set, I know that you can use Pearson's on dichotomous variables and it's just called "point biserial" and shouldn't introduce any problems.

However, I'm worried about mixing it in with the second set of variables and having a nominal/ordinal mix.

Was this choice inappropriate? If so, what is an easily implemented alternative association metric? I found Cramer's easy to code by looking at the formula on wikipedia, and Kendall's is included in the Pandas library so that was great...


Posted 2020-04-22T20:27:39.637

Reputation: 279

No answers