## Pearson correlation method using absolute values and relative values

2

I have a dataset with election results and crime rates per city. For each variable I have an absolute value (i.e. Total votes, Total crimes) and a relative value (i.e. Percentage shares of votes).

I want to calculate the correlation coefficient for some variables, but in the process I had a question about what value I need to use, if relative values or absolute values.

First I calculated z score for absolute values and then I calculated the correlation using excel. I also used pandas.DataFrame.corr() and pearsonr from scipy.stats.stats in python, in order to corroborate results.

For example, if I use absolute values I will get a positive correlation between candidate 1 and candidate 2.

x = df['Abs Cand 1'].tolist()
y = df['Abs Cand 2'].tolist()

print (pearsonr(x,y))
(0.95209664861187004, 0.0)


However, if I use relative ones I will get a negative correlation:

x = df['Rel Cand 1'].tolist()
y = df['Rel Cand 2'].tolist()

print (pearsonr(x,y))
(-0.99704737036262991, 0.0)


I was confused when I saw both results, and now I need some orientation to understand those differences.

1are candidate1 and candidate2 the votes for each candidate? in this case the absolute values are probably positively correlated because votes simply increase for both candidates with the size of the city and relative values are negatively correlated because candidate1 = 100%-candidate2. wouldn't you want to know the correlation between crime rates and votes for one of the candidates? – oW_ – 2016-10-05T19:08:32.730

@oW_ actually thats the main idea. crime rates vs votes, but I got stuck when I saw those differences. For example, using crime rates and votes, which value could be the best? – estebanpdl – 2016-10-05T19:33:51.413

1turning values into percentages is simply multiplying the values with a scalar. that should not change the correlation coefficient. – oW_ – 2016-10-05T19:44:48.500

That makes sense. Thanks, @oW_ . So, in order to get a correlation coefficient for different elections, relative values would be a better choice, since absolute values are related with size of the city. this would be right? – estebanpdl – 2016-10-05T20:22:35.863