I have a dataset of approximately 48,000 rows each one a click of a an article, some of these clicks were also comments. For each article I have the country and subject of the article and name of person who clicked on it these are all categorical variables with different levels from each other. I want to determine which of these categorical variables has the highest association or correlation with comments i.e. based the dataset is the best variable country, subject, or person to predict whether or not there will be a comment on an article.
Which statistical method would you use to determine correlation and how would you implement it in R?