5

6

I have seen researchers using pearson's correlation coefficient to find out the relevant features -- to keep the features that have a high correlation value with the target. The implication is that the correlated features contribute more information in finding out the target in classification problems. Whereas, we remove the features which are redundant and have very negligible correlation value.

Q1) Should highly correlated features with the target variable be included or removed from classification problems ? Is there a better/elegant explanation to this step?

Q2) How do we know that the dataset is linear when there are multiple variables involved? What does it mean by dataset being linear?

Q3) How to check for feature importance for non-linear case?

Thank you very much for your answer, it really helped. Two last questions as a follow up -- (1) is feature reduction done after standardization or normalization or on the raw data set?(2) Often based on practice I noticed that for regression problem, if the target response is transformed to logarithm base 10, the fit is better. Why is that? – Srishti M – 2019-11-22T01:47:26.850

1To be honest I'm not really sure about these two questions: (1) my intuition would be to standardize first, because this way the features selection takes into account the features exactly as they would be used. However I suspect that it doesn't matter too much, since standardization shouldn't change the correlation with the response variable too much. (2) I'm not sure if this is really common, it probably depends on the task, but I assume that the reason would be to convert from a non-linear relation to a (more) linear one: many problems are not linear but can be transformed into a linear one. – Erwan – 2019-11-22T01:59:38.917