I'm working on imputing null values in the Titanic dataset.
'Embarked' column has some. I do NOT want to just set them all to the most common value,
I want to impute
'Embarked' based on its correlation with the other columns.
I have tried applying this formula to the
def embark(e): if e == 'S': return 1 if e == 'Q': return 2 if e == 'C': return 3 else: return 4
This allows me to check out data.corr(), but I think it's trickier than that since I'll get a different correlation with different value assignments (right??). I also thought about using a four-dimensional (for S,Q,C,NaN) one-hot vector, but I doubt that would work.
Is there a skLearn method that does this some way? Any further insights on the matter?