I have seen researchers using pearson's correlation coefficient to find out the relevant features -- to keep the features that have a high correlation value with the target. The implication is that the correlated features contribute more information in finding out the target in classification problems. Whereas, we remove the features which are redundant and have very negligible correlation value.
Q1) Should highly correlated features with the target variable be included or removed from classification problems ? Is there a better/elegant explanation to this step?
Q2) How do we know that the dataset is linear when there are multiple variables involved? What does it mean by dataset being linear?
Q3) How to check for feature importance for non-linear case?