## Clustering based on geolocation pair

0

I am trying to process a large set of location data where a list of start and end coordinate is given. For example,

[
[(start_lat1, start_lon1), (end_lat1, end_lon1)],
[(start_lat2, start_lon2), (end_lat2, end_lon2)],
[(start_lat3, start_lon3), (end_lat3, end_lon3)],
[(start_lat4, start_lon4), (end_lat4, end_lon4)],
[(start_lat5, start_lon5), (end_lat5, end_lon5)],
...
]


My goal is to create clusters so that if different pair of start and end locations are close, they will form a cluster of pair with those start and end locations. For example, the average for the clustered pair will look something like this,

[
[(start_lat_C1, start_lon_C1), (end_lat_C1, end_lon_C1)],
[(start_lat_C2, start_lon_C2), (end_lat_C2, end_lon_C2)
...
]


I was following https://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/ this, but the tutorial only works for a single coordinate point. Any help about how to approach for paired clustering would be much appreciated.

The increase in coordinates will only lead to an increase in a dimension. The data becomes two dimensional. This will not affect the algorithm as the Euclidean distance formula is regardless of the dimensions. It works on multidimensional data as well. – Shubham Panchal – 2019-05-17T01:44:27.147

@ShubhamPanchal ... Except that using Euclidean distance on Latitude Longitude is a bad idea. – Has QUIT--Anony-Mousse – 2019-05-19T07:43:46.180

dist(x,y)=haversine(x[0],y[0])+haversine(x[1],y[1])