31

13

What is the right approach and clustering algorithm for geolocation clustering?

I'm using the following code to cluster geolocation coordinates:

```
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans2, whiten
coordinates= np.array([
[lat, long],
[lat, long],
...
[lat, long]
])
x, y = kmeans2(whiten(coordinates), 3, iter = 20)
plt.scatter(coordinates[:,0], coordinates[:,1], c=y);
plt.show()
```

Is it right to use K-means for geolocation clustering, as it uses Euclidean distance, and not Haversine formula as a distance function?

Yoou can also take a look at this similar question: https://datascience.stackexchange.com/questions/10063/for-which-real-world-data-sets-does-dbscan-surpass-k-means

– VividD – 2017-05-11T11:41:06.187