Is there a machine learning algorithm to find similar sales patterns?


I have a dataset as follows

enter image description here

(and the table extends to include an extra 146 columns for companies 4-149)

Is there an algorithm I could use effectively to find similar patterns in sales from the other companies when compared to my company?

I thought of using k-means clustering, but as I'm dealing with 150 companies here it would likely become a bit of a mess, and I don't think linear regression would work here.


Posted 2019-05-16T08:51:45.540

Reputation: 43


Please don't make more work for other people by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under the CC BY-SA 3.0 license for SE to distribute that content. By SE policy, any vandalism will be reverted. If you want to know more about deleting a post, consider taking a look at: How does deleting work?

– Glorfindel – 2019-05-16T11:41:30.907



If I understand correctly you want to find companies with similar patterns to yours.

I would start with measuring cosine similarity between your company and others.

It is really easy with Python, for example:

In [21]: from sklearn.metrics.pairwise import cosine_similarity

In [22]: cosine_similarity([[1,4,2,6], [1,9,5,4]])
array([[1.        , 0.84794633],
       [0.84794633, 1.        ]])

Note that if size of sale matters to you, this is not the right approach, as cosine similarity is magnitude invariant:

In [23]: from sklearn.metrics.pairwise import cosine_similarity

In [24]: cosine_similarity([[1,4,2,6], [10,90,50,40]])
array([[1.        , 0.84794633],
       [0.84794633, 1.        ]])


Posted 2019-05-16T08:51:45.540

Reputation: 166


I would recommend a hierarchical cluster algorithm, after normalising your numbers into proportions. Then the clustering should be able to identify similar patterns. Depending at which level you make the cut, you can decide how many clusters you want.

A great resource on this topic is Kaufman, L., & Roussew, P. J. (1990). "Finding Groups in Data - An Introduction to Cluster Analysis". John Wiley & Sons

Oliver Mason

Posted 2019-05-16T08:51:45.540

Reputation: 3 755