## Conditional clustering

1

I have a dataset consisting of addresses (points) that have several attributes; one that distinguishes the "sort" of address and one anntribute that contains a numerical value.

I want to cluster these points based on: 1. their distance to each other 2. the sort of address

However, the summed numerical attribute per cluster cannot exceed a certain threshold value.

In other words, the systeem needs to form clusters, but needs to stop clustering as soon as the sum of the numerical value attached to each address has been reached.

How do I even go about it? I have R, Python and other geo- applications at my disposal.

It seems that none of the existing clustering algorythms work. For k- means for example I need to know the number of clusters beforehand, which I don't.

It seems rather simple, but I can't find a basic methodology to follow.

Your proposed procedure needs some clarification. What do you mean by "stop clustering"? Some algorithms iteratively cluster and re-cluster the entire dataset, whereas other algorithms build clusters in batches, or one data point at a time. You will need to clarify this before the question can be answered. – shadowtalker – 2018-10-04T13:01:07.553

I think I mean one data point at a time. I think I need an algorythm that starts with placing each point in a seperate cluster, and then continues to merge clusters untill that numerical threshold value is reached. Note, I said that's what I THINK needs to happen. Maybe there are other algorythms that do work iteratively but give me the same result. – Minka – 2018-10-04T13:42:01.437

With of course taking into account the distance (the points need to be close to each other), and they also need to belong to the same category(type) – Minka – 2018-10-04T13:43:29.763

Is it only important to add the closest points to a cluster while a cluster still has capacity, or is it also important to capture your numerical value efficiently (so that your cluster preferentially chooses the highest value points as in a knapsack problem)?

– Nicholas James Bailey – 2020-09-05T06:48:26.907

Also, must all your points belong to a cluster? – Nicholas James Bailey – 2020-09-05T06:49:00.963

@NicholasJamesBailey I'm sorry for my late response, I'm just seeing your response. No, it's mostly important that the clusters are formed based on distance. Yes, all points need to belong to a cluster, I don't know how it would work though, because I would think that where you start with clustering, would have an effect on the clusters formed. You risk getting different clusters with every run, right? – Minka – 2020-12-17T12:33:16.600