2

I have two variables $X$ and $Y$ given as tuples of $(x, y)$, and I want to see if there is a relationship between the two variables. I can do so by finding the correlation coefficient.

However, I found that by selecting an arbitrary subset of the data (e.g. $(x, y) | x > k$ ), I can get a higher correlation coefficient and a stronger result. Is doing so mathematically sound? I have no a priori reason to believe that certain data points are "more important" than others, to put it simply.