Find points on a map close to given points



I have a locus L of points (lat, long). And I would like to find N=10 points (let's call them warehouses) such that:

$$loss = \sum_{l \in L} maximum_{w \in W}(distance(l, w))^2 $$

is minimized.

Is there a documented algorithm or approach that solves this problem? Right now I am thinking Excel may be able to handle this task. However I have too much data for Excel and will need to implement this in Python / Pandas.

William Entriken

Posted 2017-08-14T03:30:46.653

Reputation: 333

You haven't really defined what you mean by distance. If the points are close together then you may be alright using Euclidean distance but technically distance on a sphere (as with lat,lon) should use Haversine Distance.

– Eumenedies – 2017-08-14T08:18:41.240



I can tell you how I would do it, but there is almost certainly a faster implementation.

Assuming you start with, for each point in $L$, the distances to each warehouse, $w \in W$. These distances should be calculated by the haversine formula.

You can find the distance to the $N$th closest point in $w$ by using the quickselect algorithm. This is very similar to the quicksort algorithm but only sorting the parts that you care about. The average case for quickselect is $O(N)$ but you'll need to repeat for each $l \in L$.

Note that, since the square is monotonic for positive distances, you only need to minimise

$$\sum_{l \in L} maximum_{w \in W}(distance(l, w)) $$

I found a handy implementation of the quickselect algorithm on KoderDojo


Posted 2017-08-14T03:30:46.653

Reputation: 401

"you only need to minimise..." - but this is the main problem here - how to minimize that expression? – VividD – 2017-08-14T13:53:57.350

Just pointing out that it is not necessary to square the haversine distances which might save a bit of time. If you were using Euclidean distance then it would be less complex to leave them as squared distance but I believe haversine gives you the distance rather than squared distance. – Eumenedies – 2017-08-14T14:23:27.717


Scipy has tools most most of this already.

locations = train[['latitude', 'longitude']].values
center = locations.mean(axis=0)
warehouses = np.repeat(np.expand_dims(center, 0), 20, 0)
warehouses = warehouses.flatten()

def Distances(warehouses):
    warehouses = np.reshape(warehouses, [20, 2])
    distances = scipy.spatial.distance.cdist(locations, warehouses)
    closests = distances.min(axis=1)
    other_way = distances.min(axis=0)
    return np.append(closests, other_way)

x = scipy.optimize.least_squares(Distances, warehouses, verbose=2)

William Entriken

Posted 2017-08-14T03:30:46.653

Reputation: 333

If quickselect can be used here it will be a lot faster. – William Entriken – 2017-08-14T12:52:46.457

Nice find. However, I think minimax and least square optimization do not yield the same result. – VividD – 2017-08-14T13:51:16.790