How do you determine cut-off values for correlation when choosing features to keep?


As a beginner reading through the literature available on google, it looks like white papers publish WILDLY different scales for what is weak, moderate or strong association. I assume this is because these papers come from different fields like medicine, or marketing or whatever. I have no subject matter expertise in any particular area, so this is a challenge for me.

My data is the titanic data set, and I'm using Kendall's, Pearson's and Cramer's correlation on the three types of variables present.

How do I choose when to drop a feature for this kind of data? What are the rules of thumb for other types of data?


Posted 2020-04-22T21:07:29.667

Reputation: 279

No answers