3

1

I was playing around with some data to practice my Python and machine learning skills and wanted to create polynomial features from two features that I think are related and have a strong influence on the predicted output.

Unfortunately my data has missing values (np.NaN) and sklearn's PolynomialFeatures() can not handle these values. **What is the best way to impute these values?**

I've been trying to replace them with 0, 1, mean and median and for my dataset using the median seems to be the best solution. **But can this be generalized and what is the intuition behind it?**

I was also wondering if filling methods like ffill, bfill or even KNN modelling can be useful in this context.

Thanks a lot!