I have a dataset with election results and crime rates per city. For each variable I have an absolute value (i.e. Total votes, Total crimes) and a relative value (i.e. Percentage shares of votes).
I want to calculate the correlation coefficient for some variables, but in the process I had a question about what value I need to use, if relative values or absolute values.
First I calculated
z score for absolute values and then I calculated the correlation using excel. I also used
scipy.stats.stats in python, in order to corroborate results.
For example, if I use absolute values I will get a positive correlation between candidate 1 and candidate 2.
x = df['Abs Cand 1'].tolist() y = df['Abs Cand 2'].tolist() print (pearsonr(x,y)) (0.95209664861187004, 0.0)
However, if I use relative ones I will get a negative correlation:
x = df['Rel Cand 1'].tolist() y = df['Rel Cand 2'].tolist() print (pearsonr(x,y)) (-0.99704737036262991, 0.0)
I was confused when I saw both results, and now I need some orientation to understand those differences.
Thanks in advance!