5

2

I hope we can remove the highly correlated variables based on the feature importance may be with PCA etc.

Is there anything we can do with highly correlated variables/

Thanks in advance !

5

2

I hope we can remove the highly correlated variables based on the feature importance may be with PCA etc.

Is there anything we can do with highly correlated variables/

Thanks in advance !

4

An alternative to the one provided by @Kasra is **dimensionality reduction**. It's another way of solving your multicollinearity problems, while avoiding deleting variables more or less arbitrarily.

You can use simpler, linear techniques such as **PCA**, or more complex non-linear techniques such as **Autoencoders**. **t-SNE** is a non-linear technique that is typically used for visualization, I do not recommend to use it for a Training set.

3

I think merging such correlated features and create a new one, will also be a good idea. In that way we will not lose any information.

For example, sum up the values of different correlated features and take an average of it, will be the very basic option.

2

You need to remove them. Redundant features only increase the computation time, increase model complexity (with no benefit) which means making interpretation of model/analysis more sophisticated and if they are many, removing them prunes your vector space by improving the density of information in dimensions of vector space (it helps e.g. in finding nearest neighbors).

1In the situation of two highly correlated features, actually choosing one does the same job. Question mentioned .95 which means they are practically the same. Plus the fact that merging strategy needs to be chosen. What is ur idea for merging? – Kasra Manshaei – 2020-02-07T17:44:43.727