0

Given the dataset with the spatial polygons below, I want to cluster these polygons to bigger areas, based on a similarity matrix. For the computation, I would like to use R. The polygons represent districts in a town. Each polygon has population attribute "anzahl_pers" (which can be 0).

For the clustering, I want to apply several constraints: The resulting clusters should be contiguous. I can achieve this with the R package Clustgeo (https://cran.r-project.org/web/packages/ClustGeo/vignettes/intro_ClustGeo.html) which can apply a neighborhood matrix.

The second constraint I want to consider is the population number in the resulting cluster. The clustering should stop when a threshold is reached, i.e. 20 people in one cluster. Polygons that have a bigger population should not be assigned to cluster at all.

I saw the package scclust (https://cran.r-project.org/web/packages/scclust/index.html) which respects constraints in the size of cluster. However, I don't want a certain number of polygons to be clustered but the sum of an attribute should get as close to a threshold as possible. The number of polygons in each cluster is not relevant, however it should be as low as possible to match the constraint.

Ideally, the number of clusters should not be given beforehand. However, a workaround might be to divide the total population by the threshold or to iteratively lower the number of clusters until the constraints are matched.

If possible, the resulting cluster shapes should be compact, but this is not as important as the other constraints.

I am looking for a R package or solution that clusters with respect to the sum of a certain attribute (in my case population in each cluster).

**Data:**

**Similarity matrix:**

https://pastebin.com/embed_iframe/7xx4BLa7

**Polygons:**

https://pastebin.com/embed_iframe/NFhJke4A

**Population:**

```
population<-structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121), anzahl_pers = c(13, 5, 7, 3, 1, 2, 4, 4, 8, 4, 3,
4, 16, 5, 6, 3, 2, 2, 4, 2, 2, 6, 6, 5, 7, 4, 26, 3, 44, 8, 5,
14, 27, 8, 4, 1, 6, 0, 2, 16, 53, 9, 20, 23, 17, 0, 7, 21, 30,
2, 13, 10, 8, 2, 3, 12, 8, 0, 9, 0, 62, 14, 29, 14, 7, 8, 16,
15, 5, 28, 9, 68, 37, 3, 12, 7, 13, 6, 7, 6, 2, 8, 9, 1, 29,
7, 5, 31, 5, 4, 5, 28, 25, 6, 8, 16, 7, 39, 42, 31, 4, 8, 4,
2, 1, 6, 11, 17, 1, 3, 8, 6, 16, 10, 3, 1, 8, 2, 0, 2, 3)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 68L,
69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 67L, 78L, 79L, 80L,
81L, 82L, 83L, 84L, 85L, 86L, 87L, 89L, 90L, 91L, 92L, 93L, 94L,
95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L,
106L, 107L, 108L, 88L, 109L, 110L, 111L, 112L, 113L, 114L, 115L,
116L, 117L, 118L, 119L, 120L, 121L), class = "data.frame")
```