Correlation between specific columns of a data set



I have a CSV file which has 150 columns belonging to 7 categories but I want a correlation between 2 categories. The categories are movies and music, 12 and 19 columns respectively.

Is there a way to plot a correlation matrix or a correlation graph between two of the categories and selected columns?

For example, 19 columns on x and 12 columns on y. Or summing 12 and 19 columns and having a correlation between only 31 columns instead of 150.

I'm using Python. Which packages could help me?

Maha Kamal

Posted 2017-11-30T19:57:04.113

Reputation: 101



I recommend you to use the following example and try to manipulate the arguments and adjust them for your work:

from matplotlib import cm
cmap = cm.get_cmap('gnuplot')
scatter = pd.scatter_matrix(YOUR_TRAINING_DATA, c = YOUR_LABELS_OF_TRAINING, marker = 'o', s = 40, hist_kwds = {'bins':15}, figsize = (12, 12), cmap = cmap)

Code and image given from coursera data science course


Posted 2017-11-30T19:57:04.113

Reputation: 12 077