## Efficient way to generate random points with a predefined lower bound on their pairwise Euclidean distance

45

21

Using Mathematica what is an efficient way to generate a list of $n$ random two dimensional points $\{x_i,y_i\}$ where $i=1,...,n$ so that no two points $p_1$ and $p_2$ in the list has an Euclidean distance lower than say $d$ meaning $\|p_1-p_2\|\leq d$. I came up with the following solution. Though it works I wanted to know if some better method exists. Code

NodeGenetrator[LowerBound_, UpperBound_, DistanceBound_,SampleLength_] :=
Block[{list},
list = RandomReal[{LowerBound, UpperBound}, {1, 2}];
For[i = 0, Length@list <= SampleLength - 1, i++,
list = Module[{NewVal, dist},
NewVal = RandomReal[{LowerBound, UpperBound}, 2];
(*NewVal=RandomVariate[NormalDistribution[
Mean@{LowerBound,UpperBound},DistanceBound],2];*)
dist = Map[EuclideanDistance[NewVal, #] &, list];
If[Min[dist] >= DistanceBound,
AppendTo[list, NewVal],
list
]
];
];
list
];
(* Define function parameters *)
LowerBound = 0;
UpperBound = 100;
DistanceBound = 5;
SampleLength = 60;
sample = NodeGenetrator[LowerBound, UpperBound, DistanceBound,SampleLength];
(ListPlot[#, Frame -> True, Axes -> False,PlotStyle -> PointSize[Large],
AspectRatio -> 1] &)@sample


Output

1. As one can see this code uses For loop and also keeps on iterating until number of points requested is not met. This somehow makes the execution time for this function unpredictable!
2. As expected the parameter DistanceBound has a important effect on the function behavior. If we try DistanceBound=12.5 function evaluation becomes very time consuming.
3. Here we check only one mutual distance condition but is it possible to use a more general test function that checks for more than one mutual characteristics involving those requested number of points that are to be generated.

BR

Perhaps I am getting this backwards, but why not generate random points within a given envelope (say, a circle) and scale the random coordinates with the largest possible distance within that envelope (e.g. the diameter) to your parameter d? Or are p1 and p2 sequential elements of the list? – Yves Klett – 2012-03-05T17:12:08.767

@YvesKlett The elements $p_1$ and $p_2$ are not sequential in the list that we want to generate. They are just arbitrary pair. – PlatoManiac – 2012-03-05T17:18:16.073

3You probably meant .. so that NO two points ... has a Euclidean distance lower than say d in the second line? – kglr – 2012-03-05T17:18:32.673

@kguler Thx for the comment! I corrected my post. – PlatoManiac – 2012-03-05T17:20:42.543

2"Random" does not mean merely arbitrary; nor, in practice, does it mean (merely) that random numbers were involved. To use "random" points, you need to know their probability distribution. (There exist efficient solutions to this problem having very different distributions.) So: precisely what distribution do you want these points to have? – whuber – 2012-03-05T18:22:56.260

@whuber I dont have any special requirement on the distribution any probability distribution should do. You can have a look at the commented part of the code where I use NormalDistribution. There the variance use used as DistanceBound and mean of the lower and upper bound is chosen as the mean of the distribution. One can use multiple of the DistanceBound variance to guarantee too many iterations are not needed to achieve the requested sample. If you know any efficient solution please share. – PlatoManiac – 2012-03-05T19:26:50.640

3I would like to suggest you do have a requirement on the distribution. Otherwise, find two children and ask them to draw pictures of solutions. Check the solutions and digitize them. Ask Mathematica to choose randomly between the two solutions. After the initial computation of the two solutions, this is extremely efficient and is obviously random. I'm not being facetious: this is a nice example of a spatial stochastic process that meets every one of your stated requirements. It might help you see why specifying a distribution is important. – whuber – 2012-03-05T20:11:29.767

4

For practical solutions which may meet your needs, start with the Wikipedia article on low-discrepancy sequences and then look closely at the Halton sequence.

– whuber – 2012-03-05T21:02:35.740

34

This is not an efficient answer but it is fun to play with so I thought I'd post it. For efficiency the use of Nearest might provide a good starting point.

g[n_, {low_, high_}, minDist_, step_: 1] :=
Block[{data = RandomReal[{low, high}, {n, 2}], temp, happy, sdata,
hdata},

While[True,
temp = ((Nearest[data][#, 2][[-1]] & /@ data));
happy =
hdata = Pick[data, happy, 1];
sdata = Pick[data, happy, 0];
If[sdata === {}, Break[]];
sdata += RandomReal[{-step, step}, {Length[sdata], 2}];
data = Join[sdata, hdata];
];
data
]


The idea is to do an initial sampling and then allow the points that are too close to "walk" somewhere else. The function takes a desired number of points n, a low and high value for the data range, the minimum acceptable distance between points minDist and a step argument which allows points to "walk" up to a certain distance in the x and y directions.

Its especially fun to watch dynamically.

g[150, {0, 30}, 1.5, 1]


Edit: Per suggestion of Yves Klett the points are colored by relative happiness (green being more happy, red being less happy).

Edit 2:

Now for a more serious attempt at something efficient..

findPoints =
Compile[{{n, _Integer}, {low, _Real}, {high, _Real}, {minD, _Real}},
Block[{data = RandomReal[{low, high}, {1, 2}], k = 1, rv, temp},
While[k < n,
rv = RandomReal[{low, high}, 2];
temp = Transpose[Transpose[data] - rv];
If[Min[Sqrt[(#.#)] & /@ temp] > minD,
data = Join[data, {rv}];
k++;
];
];
data
]
];


And to test it for benchmarking...

npts = 350;
minD = 3;
low = 0;
high = 100;

AbsoluteTiming[pts = findPoints[npts, low, high, minD];]

==> {0.0312004, Null}


Check that the min distance is less than the threshold.

Min[
EuclideanDistance, {pts, Nearest[pts][#, 2][[-1]] & /@ pts}]] > minD

==> True


Check that I generated the correct number of points..

Length[pts] == npts

==> True


In answer to my own question that was similar to this problem, there's a 3D version of this solution at http://mathematica.stackexchange.com/questions/48427/generate-a-set-of-3d-coordinates-subject-to-constraints

– dr.blochwave – 2014-05-23T20:04:51.120

Improvement: instead of calculating a square root for each point, calculate the square of the distance. (Also you could iterate over the points and break at the first one that is found to be too close, instead of calculating all their distances and checking the minimum, but I'm not sure if that'd be faster). – Gerardo Marset – 2013-02-14T23:52:27.847

@GerardoMarset that's a good suggestion. Give it a try. If the timings are better feel free to edit. – Andy Ross – 2013-02-15T04:13:54.807

@AndyRoss I'm sorry, can't. I don't have Mathematica. – Gerardo Marset – 2013-02-15T05:09:09.663

2You should color the points according to their degree of happiness (or claustrophobia)! – Yves Klett – 2012-03-05T19:38:23.017

@Yves That would be yet more fun.. at least when animated we see this to some extent based on their relative laziness :) – Andy Ross – 2012-03-05T19:43:03.307

19

If you don't need high precision, you can do something along these lines:

canvas = Image@ConstantArray[0, {100, 100}];

distance = 6;

{img, {pts}} =
Reap[Nest[
ImageCompose[#,
SetAlphaChannel[#, #] &@Image@DiskMatrix[distance],
Sow@RandomChoice@
Position[Transpose@ImageData[#, DataReversed -> True], 0.]] &,
canvas, 150]];


This is not fast in general, but it does not slow down when the density is very high. Note how the region is nearly completely filled up.

{Framed[img], Graphics@Point[pts]}


The idea can be used to speed up other methods when the point density becomes high.

Your code needs to be updated to accommodate the new version (such as 12.0). – A little mouse on the pampas – 2020-01-31T02:07:22.947

1This is a very creative answer, +1! There seems to be a relationship between the dimensions of canvas and the maximum number of possible iterations in the Nest function (and therefore number of points). Changing the 150 to 180 without also increasing canvas gives a range of errors including RandomChoice::lrwl: "The items for choice {} should be a list or a rule weights -> choices." – Verbeia – 2012-03-05T21:03:11.887

@Verbeia That is correct, when there's no more space for new points, RandomChoice will return an empty list. I just illustrated the context without a robust implementation. Ideally this method could be used to weed out positions where a new point definitely can't be placed, on a square grid. Then choose a cell in the grid, and randomly position a new point inside. Then do another test to see if the point is too close to any points in nearby cells. This will give a method which is both fast and precise, but it's some work to implement it right. – Archimedes of Syracuse – 2012-03-05T21:22:04.920

@ArchimedesofSyracuse This is what I am doing with this ind of disjoint squares last one hour. Not an easy implementation for me. On the other hand I also think we can try to compute the probability distribution explicitly by hand. Then use this distribution in MMA to find the points. But may be it is more difficult mentally than efficient code writing.. – PlatoManiac – 2012-03-05T21:31:00.543

@PlatoManiac The big problem with calculating the probability distribution and using that is that the points positions are not independent, and can't be independently generated ... so we're back to square one. – Szabolcs – 2013-04-09T14:32:32.393

13

Well, a very simple but very limited way could be to generate a random set of points, calculate all distances between them and scale the minimum distance to mindist:

mindist = 1;
npts = 200;
pts = RandomReal[{0, 100}, {npts, 2}];
scaledpts = mindist/Min[Norm /@ Subtract @@@ Subsets[pts, {2}]]*pts;

Graphics[{Green, Point[pts], Red, Point[scaledpts]}, Frame -> True]


Edit: On the plus side this should work for arbitrary distributions

mindist = .01;
npts = 1000;
pts = RandomVariate[NormalDistribution[1, 3], {1000, 2}];
scaledpts = mindist/Min[Norm /@ Subtract @@@ Subsets[pts, {2}]]*pts;

Graphics[{Green, Point[pts], Red, Point[scaledpts]}, Frame -> True]


Of course, this becomes ugly fast (and hopefully no point is duplicate!). But for small sample numbers, why not? Also, a nice way to see how much memory you´ve got (my machine with 16GB maxed out between 9500 and 10000 points).

See the humorous memory graph with successive runs (1000 pts increase). Rather interesting sawtooth there which releases the memory nicely each time.

This Graphics[{Green, Point[pts], Red, Circle[#, mindist] & /@ scaledpts}, Frame -> True]reveals that there are some overlapping circles, though only few. Thus, not all resulting points are distant enough. – Alexei Boulbitch – 2015-08-21T09:44:12.763

1@Alexei Boulbitch, The circles of radius mindist can overlap, but the center of each circle can only lie in one circle. Perhaps you meant to test with radius mindist/2? – Timothy Wofford – 2015-08-21T15:40:59.203

11

Not very pretty, but you could do something like this

generate[pts_, bnds_, mindist_] := Module[{x, plist, ylist, intervals},
x = RandomReal[bnds[[1]]];
plist = Pick[pts, UnitStep[Abs[x - pts[[All, 1]]] - mindist], 0];
If[Length[plist] == 0,
Return[Join[pts, {{x, RandomReal[bnds[[-1]]]}}]]];
ylist =
List @@ IntervalIntersection[Interval[bnds[[1]]],
IntervalUnion @@ (Interval[#[[2]] +
Sqrt[mindist^2 - (#[[1]] - x)^2] {-1, 1}] & /@ plist)];
intervals = If[Length[ylist] >= 2,
Transpose[{ylist[[;; -2, 2]], ylist[[2 ;;, 1]]}], {}];
If[ylist[[1, 1]] > bnds[[1, 1]],
PrependTo[intervals, {bnds[[1, 1]], ylist[[1, 1]]}]];
If[ylist[[-1, 2]] < bnds[[1, -1]],
AppendTo[intervals, {ylist[[-1, 2]], bnds[[1, -1]]}]];
If[intervals =!= {},
Join[pts, {{x,
RandomReal@
RandomChoice[(#2 - #1) & @@@ intervals -> intervals]}}],
pts]
]

points[bnds_, mindist_, n_] :=
NestWhile[generate[#, bnds, mindist] &, {}, Length[#] < n &, 1, 2 n]


Here, bnds is the domain of the random points in the form {{xmin, xmax}, {ymin, ymax}}, mindist is the minimum distance between two points, and n is the number of generated points.

Example

bnds = {{0, 100}, {0, 100}};
mindist = 5;
npts = 150;
pts = points[bnds, mindist, npts];

ListPlot[pts, Frame->True]


Explanation

The idea is to iteratively generate new points by choosing an arbitrary value $x_0$ in the interval $[x_\text{min}, x_{\text{max}}]$. For each already generated point p that is closer than mindist to the line $x=x_0$ we find the interval $[y_1, y_2]$ such that $||p-(x_0,y)||\leq \text{mindist}$ for $y_1\leq y\leq y_2$ and choose an arbitrary value for $y$ from the complement of the union of all these intervals (provided this set is non-empty).

If the plane starts filling up, the chance of choosing a value for x for which there is no available y value becomes larger, so we still need to generate extra points. To reduce the number of extra generated points, this method could be combined with Archimedes' solution, for example by periodically checking for which intervals for x there are still y values available, and choosing only from those.

This scales impressively well +1. – Andy Ross – 2012-03-06T03:31:59.883

8

Here is an outline of a fairly efficient way to do this.

Say you want to generate $m$ points with a minimum separation of $d$, in a box of size $s \times s$. Let $d_2=\max(d,\sqrt{m}/s)$. Partition your box into subsquares of size $d_2 \times d_2$; these will be used as bins. We will generate, say, $3m$ random points in the full box. Now place each in its appropriate bin. This can be done in $\mathcal{O}(m)$.

Now iterate over the generated points. For each, locate its bin, and gather all points in neighboring subsquare bins. For each such neighbor, see if it is within distance $d$ of the candidate point and has already been chosen. If not, the candidate point is chosen as a member of our random set, and a down value is set to indicate this (so checking each neighbor is $\mathcal{O}(1)$).

We stop once we have $m$ points collected, or else run out of points. If the latter, generate more, add them to the existing binds, and proceed as before.

The memory hoof-print is $\mathcal{O}(d_2^2)$. The speed will typically be $\mathcal{O}(m)$ unless the sizes of $d$ and $s$ make it either difficult or impossible to have $m$ points inside with minimal separation of distance $d$.

There, you had to start with the Big O notation. Now everyone will start benchmarking - think of all the entropy ;-)

– Yves Klett – 2012-03-06T17:31:49.327

3No need to benchmark, @Yves: I have already claimed the prize for fastest (c. 500,000,000 point patterns per second). See comments to the question. :-) – whuber – 2012-03-06T18:44:34.743

7

A more recent question got pointed here as it being a duplicate of this one, so I am adding one of my answers from there to here.

None of the answers here so far make efficient use of Nearest[], which is quite powerful for a large number of points as it makes use of a partitioning algorithm. Nearest[] can generate a NearestFunction[] for a set of points, which does that partitioning once. Then the generated function can be used repeatedly over many points.

Here cull[] takes a list of points and removes enough of those points to meet the condition that no two points are closer than d from each other. Then many candidate points can be added and a single NearestFunction[] used to cull the points. Note that this will work for any number of dimensions, as well as measures other than Euclidean.

cull[pts_, d_] :=
Module[{p, f, c},
p = pts;
While[
f = Nearest[p -> Automatic];
c = f[p, {2, d}];
Last[Dimensions[c]] != 1,
p = Pick[p, First[#] <= Last[#] & /@ c];
];
p
]


The use of the efficient algorithm becomes important when generating a large number of points, and especially when trying to fit as many randomly selected points as possible into a region while meeting the condition.

spread[] uses cull[] as described, adding n dim-dimensional points to the list each time, and culling, repeating until n points remain. n and d can be chosen to be too large relative to high-low, making it impossible to fit that many points in the region. For that reason, spread[] also has a maximum number of iterations for adding the last point. Without a maximum, the algorithm can provide no assurance that it will terminate. Each iteration tries n samples. When n*maxiter points are tried with zero points successfully added, spread[] returns the points it has, which meet the bounds and distance condition, but the count is short of n.

spread[n_, dim_, d_, low_, high_, maxiter_] :=
Module[{p, m, k},
p = {};
m = 0;
k = maxiter;
While[
p = Join[p, RandomReal[{low, high}, {n, dim}]];
p = cull[p, d];
k = If[Length[p] > m, maxiter, k - 1];
m = Length[p];
m < n && k > 0
];
If[Length[p] > n, Take[p, n], p]
]


This example use of spread[] casts the result in the form of circles, which makes it easy to visualize the result and see that the points have the required separation, since the circles don't overlap.

circles[n_, r_, maxiter_] :=
Circle[#, r] & /@ spread[n, 2, 2 r, -1, 1, maxiter]


Then:

circles[100, 0.1, 1000] // Graphics


gives 80 circles (100 is too many random 0.1 radius circles in a 2x2 box):

Here is an example with spheres:

spheres[n_, r_, maxiter_] :=
Sphere[#, r] & /@ spread[n, 3, 2 r, -1, 1, maxiter]

spheres[150, 0.2, 1000] // Graphics3D


Returning 127 radius 0.2 spheres, since 150 is too many in a 2x2x2 box:

What is sweep used within you function spread? – Alexei Boulbitch – 2015-08-21T10:11:44.593

Sorry. Supposed to be cull. Fixed answer. – Mark Adler – 2015-08-21T15:15:44.247

I'm glad to see that new version of NearestFunction is listable. – ybeltukov – 2015-10-27T16:39:41.477

6

A partial answer based on low-discrepancy Halton sequences (see the Quasi- Random Number Generators section of the tutorial CUDALink/tutorial/Applications in the docs):

Using

 VanDerCorput[base_][len_] := Table[
With[{digits = Reverse@IntegerDigits[n, base]},
Sum[2^(-ii)*digits[[ii]], {ii, Length[digits]}]], {n, len}]


as the generator, one can generate 2D Halton sequences using two different bases and combining the resulting numbers in pairs:

 sample = Transpose[{VanDerCorput[2][1000], .5 VanDerCorput[3][1000]}]


where I rescaled the second component to constrain the points to unit square. Halton sequences fill the unit square uniformly as seen in the plot below:

Halton sequences are good candidates for generating the raw samples on which further selections using many cool ideas put forth in other answers.

For instance, using Yves Klett's scaling idea:

 mindist = .02;
scaledsample =  mindist/Min[Norm /@ Subtract @@@ Subsets[sample, {2}]]*sample;


we get

@kguler, did you maybe find what was wrong with uniformity? – VividD – 2014-06-20T09:02:37.720

1@kglr As in the comment to the Ives Klett's answer Graphics[{Red, Circle[#, mindist] & /@ scaledpts}, Frame -> True] reveals that there are overlapping circles with the radius equal to mindist. I have seen even combinations of 3 overlapping ones. They are rather rare though. I like the approach very much, nevertheless. – Alexei Boulbitch – 2015-08-21T10:06:14.313

3Is it just my eyes, or is that not uniform? It appears heavy in the middle, and sparse at the top and bottom. – Mr.Wizard – 2012-03-06T10:07:53.783

@Mr.Wizard, your eyes are ok. Need to re-check the code and the data. – kglr – 2012-03-06T10:37:48.257