I was playing around with some data to practice my Python and machine learning skills and wanted to create polynomial features from two features that I think are related and have a strong influence on the predicted output.
Unfortunately my data has missing values (np.NaN) and sklearn's PolynomialFeatures() can not handle these values. What is the best way to impute these values?
I've been trying to replace them with 0, 1, mean and median and for my dataset using the median seems to be the best solution. But can this be generalized and what is the intuition behind it?
I was also wondering if filling methods like ffill, bfill or even KNN modelling can be useful in this context.
Thanks a lot!