Find recurrent dates in a small set (and get rid of non recurrent ones)


I need help in the analyse of a categorization problem.

Given a set of dates (small set: 20 elements maximum), I would like to group dates which are equally distributed (with a tolerance). It can be, for instance, monthly or weekly separated dates.

Here is an example. Given this repartition : Repartition of the dates

I would like to categorize into these two groups: Groups of data

The problem is that I am a developer, not a data scientist. I have an intuition that it should be possible to do a kind of regression.

I have no clue how to analyse this problem. Can you help me with that, please ?


PS: I have already seen this thread (Recurring events - finding in a time series) but I have not helped me.


Posted 2017-07-25T10:58:28.337

Reputation: 31

1Why do you need to do a classification to group the dates? Will setting a tolerance and specifying an algorithm to separate the dates into two groups not be sufficient since the set of dates are small? – gchaks – 2017-07-26T20:33:41.240

That is true but the example I gave is a very simple one. Actually, I can have one or more groups of recurrent dates and some "noise" too (dates which are not recurrent). This said, the data set will be small. Do you know some algorithms that can do the job? Otherwise, I thought that algorithms like k-means work, but I am not sure. – Damien – 2017-07-27T10:08:51.767



If it is a categorization problem then you should look for a classification algorithm, not a regression technique. The simplest classification algorithm is Logistic Regression.

But by the looks of it, seems like you do not have a labelled data-set and if that's the case you should look for Clustering techniques. Clustering is a part of Unsupervised Learning Technique in ML which create clusters or groups of similar data points.


Posted 2017-07-25T10:58:28.337

Reputation: 264

Thanks for the answer. So, to my understanding I need to express the dates I have using vectors of numbers. Then I give these vector to clustering algorithms. My questioning is how can I build this data? Should I build a vector for each date which would be composed of the distance (in days) with the other dates, or something else? – Damien – 2017-07-26T13:44:11.297

It is very unlikely this would help you. I can't come up with anything at least – Carl Rynegardh – 2018-02-20T22:11:12.077


You can use clustering algorithm to cluster closer dates together. But since you've mentioned the number of dates to be clustered won't be more than 20, seems like you can just create a simple logic to group them together.

Pick a base date which can be anything and find the num of days/weeks/months from the base date to each date in your dataset. you'll get a bunch of numbers now. You can now bucket them together according to a threshold you like.

Although clustering algorithm too would do the same. Just the thresholding would be be taken care of automatically based on optimal cutoff. Try the simplest (read: easy to understand) clustering algorithm: K-Means.

Santoshi M

Posted 2017-07-25T10:58:28.337

Reputation: 385

How would clustering fix the problem? If the data looks as the image, who would that work? I can't see it. It is a straight line. K-means will not give you anything – Carl Rynegardh – 2018-02-20T22:14:44.867


I don't think this is a problem to which machine learning is the answer. I can't think of any kind of clustering that would work here. My instinctive approach would be to remove the trend of the data and then use a fourier transform to assess the recurrent frequencies. It should then be reasonably straightforward to classify the points as being part of the patterns that are identified there, and everything else can be dropped into an "other" bucket.

Dan Scally

Posted 2017-07-25T10:58:28.337

Reputation: 1 574