Clustering based on geolocation pair

0

I am trying to process a large set of location data where a list of start and end coordinate is given. For example,

[
  [(start_lat1, start_lon1), (end_lat1, end_lon1)],
  [(start_lat2, start_lon2), (end_lat2, end_lon2)],
  [(start_lat3, start_lon3), (end_lat3, end_lon3)],
  [(start_lat4, start_lon4), (end_lat4, end_lon4)],
  [(start_lat5, start_lon5), (end_lat5, end_lon5)],
  ...
]

My goal is to create clusters so that if different pair of start and end locations are close, they will form a cluster of pair with those start and end locations. For example, the average for the clustered pair will look something like this,

[
  [(start_lat_C1, start_lon_C1), (end_lat_C1, end_lon_C1)],
  [(start_lat_C2, start_lon_C2), (end_lat_C2, end_lon_C2)
  ...
]

I was following https://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/ this, but the tutorial only works for a single coordinate point. Any help about how to approach for paired clustering would be much appreciated.

Irteza

Posted 2019-05-16T21:29:26.937

Reputation: 1

The increase in coordinates will only lead to an increase in a dimension. The data becomes two dimensional. This will not affect the algorithm as the Euclidean distance formula is regardless of the dimensions. It works on multidimensional data as well. – Shubham Panchal – 2019-05-17T01:44:27.147

@ShubhamPanchal ... Except that using Euclidean distance on Latitude Longitude is a bad idea. – Has QUIT--Anony-Mousse – 2019-05-19T07:43:46.180

Answers

1

Define your own distance function.

I suggest you simply use

dist(x,y)=haversine(x[0],y[0])+haversine(x[1],y[1])

Has QUIT--Anony-Mousse

Posted 2019-05-16T21:29:26.937

Reputation: 7 331

Thanks for the answer. Although I solved the problem using another approach, I will definitely try your suggestion. – Irteza – 2019-05-21T18:10:37.553