62

32

What is the right approach and clustering algorithm for geolocation clustering?

I'm using the following code to cluster geolocation coordinates:

```
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans2, whiten
coordinates= np.array([
[lat, long],
[lat, long],
...
[lat, long]
])
x, y = kmeans2(whiten(coordinates), 3, iter = 20)
plt.scatter(coordinates[:,0], coordinates[:,1], c=y);
plt.show()
```

Is it right to use K-means for geolocation clustering, as it uses Euclidean distance, and not Haversine formula as a distance function?

Yoou can also take a look at this similar question: https://datascience.stackexchange.com/questions/10063/for-which-real-world-data-sets-does-dbscan-surpass-k-means

– VividD – 2017-05-11T11:41:06.187I think the feasibility of k-means would depend on where your data are. If your data is spreaded all over the world, it won't work, as the distance is not euclidean, as other users have already told. But if your data is more local, k-means would be good enough, as the geometry is locally euclidean. – Juan Ignacio Gil – 2018-05-31T08:34:00.893