Similar algorithm to apriori to find unpopular sequential patterns

1

I am working with a dataset that looks like similar to this one but it is way larger (approx. 30.000 arrays):

sequences = [[car, house, bike, train],
             [car, house, bike], 
             [apartment, train, car], 
             [building,flower, bike, train],...]

It is a bit difficult to explain but there are some items in the data, that occur pretty often and some dont. Nevertheless the items that occurs less often occur in sequences that I want to find as well. So in this case I would like to find all transportation possibilities that occur together, but I also want to find all housings that occur together or all plants ect. I know that plants and buildings dont occur very often in general so the support value of these itemsets in apriori are penalized by this. All my results somehow contain transportation possibilities as their support values are higher as they occur often within the data set.

Currently I am using MLxtend's apriori but I would be glad if there are algorithms that I could use that do not penalize less often occuring items in the data set.

Is there some library/algorithm I could use to solve this problem?

Thanks in advance!

Eve Edomenko

Posted 2020-06-26T15:16:09.820

Reputation: 13

Have you tried lowering min_support parameter? Apriori prunes the result tree based on the minimum support threshold. Any similar method will put high-frequency results first, so you are bound to run into the same problem with your data. – Vlad_Z – 2020-06-27T06:47:31.460

Hi @Vlad_Z, yes I have lowered it to approx. 0.007 which resulted in extraordinary high memory usage and basically the same results but with larger filtered sequences. So that didn't help either. Thanks for the answer anyways! So maybe the way I am trying to solve the problem is not correct? – Eve Edomenko – 2020-06-29T09:35:00.100

Yes, at support rates this low you should be using a different algorithm, such as Apriori-Inverse (DOI: 10.1007/11430919_13). I will try to provide a more useful answer when/if possible. – Vlad_Z – 2020-06-29T10:42:57.603

Thank you very much @Vlad_Z, this is a great help for me – Eve Edomenko – 2020-06-29T12:47:02.533

Answers

0

The algorithms I would recommend in your case are Apriori-Inverse and Apriori-Rare. Disclaimer: I have not found a Python implementation (nor can I provide a reliable one due to time constraints), but there exists an open-source Java library SPMF with implementations of these algorithms (Inverse, Rare, and a whole bunch more) which can be used for reference -- the source code has informative comments and the algorithms are structured well enough to understand what is going on. Of course, this is only an option if you are willing to do some digging and have enough time on your hands. Maybe going through that Java code will provide you with insights on how to modify existing Python implementations to suit your needs, which should require far less effort, since Inverse and Rare differ very little from the classical Apriori. I hope this helps you at least somewhat. Good luck!

Vlad_Z

Posted 2020-06-26T15:16:09.820

Reputation: 500

Thank you very much for your answer. I found the SPMF library yesterday and there is actually a python wrapper that can be used :) I do not have that much time on my hands unfortunately but I will definitly give it a try and understand what is going on and try to implement it. Thanks for your time and research! I am not allowed to upvote your answer due to the fact that I dont have enough poitns to do so but I accepted your answer :) – Eve Edomenko – 2020-07-01T08:23:05.277