5

Given a dataset that has a binary (0/1) dependent variable and a large collection of continuous and categorical independent variables, is there a process and ideally a R package that can find combinations/subsets/segments of the IVs that are highly correlated with the DV?

Simple example: DV: college education (0/1), and IVs: age (20 to 120), income (0 to 1 million), race (white, black, hispanic etc), gender (0/1), state, etc.

Then finding correlations combining IVs and subsets of IVs (e.g. women between 30 and 50, with incomes over 100k are highly positively correlated with the DV), and then being able to compare the combinations (e.g. to find out women between 30 and 40, with incomes over 100k have a higher correlation than women between 40 and 50, with incomes over 100k)

Maybe I tend to always want to keep it simple, but why not just use Logistic Regression with some interaction terms? – None – 2015-03-01T03:58:26.763