Spatially constrained geospatial similarity



What's the current methodology for clustering geospatial data by features?

Example: I have some demographic dataset. Let's say this contains average home price and population density.

So, an example correlation here would be home price vs population density. But, the trick is how the clustering gets pulled. For example, an affluent area with high population density isn't the same as one with low population density. Applying a basic distance metric wouldn't take this into account since low vs highs could offset each other giving similar distances. This leads me to possibly some form of weighted clustering to pull centroids.

Not sure what methodology takes this into account.


Posted 2020-05-13T21:28:54.407

Reputation: 41

I’m not sure I understand your example of differences in one feature offsetting distances in another feature. Assuming your features are relatively orthogonal, this shouldn’t be possible. Could you please let us know what tour of clustering you have tried and why you don’t think it’s working? – Nicholas James Bailey – 2020-06-27T21:57:22.017



I assume you are trying to find a suitable distance metric based on features of different areas (although spatial distances might also easily be plugged in). In that case, I would first try to make sure the different features are correctly scaled, for example, to zero mean and unit variance.

If the result does not seem right, I would also try looking at different distance metrics. A simple alternative example is the L1 norm:

L1(a, b) = sum_x |x_a - x_b|

Jan Šimbera

Posted 2020-05-13T21:28:54.407

Reputation: 261