I am working with a dataset that looks like similar to this one but it is way larger (approx. 30.000 arrays):
sequences = [[car, house, bike, train], [car, house, bike], [apartment, train, car], [building,flower, bike, train],...]
It is a bit difficult to explain but there are some items in the data, that occur pretty often and some dont. Nevertheless the items that occurs less often occur in sequences that I want to find as well. So in this case I would like to find all transportation possibilities that occur together, but I also want to find all housings that occur together or all plants ect. I know that plants and buildings dont occur very often in general so the support value of these itemsets in apriori are penalized by this. All my results somehow contain transportation possibilities as their support values are higher as they occur often within the data set.
Currently I am using MLxtend's apriori but I would be glad if there are algorithms that I could use that do not penalize less often occuring items in the data set.
Is there some library/algorithm I could use to solve this problem?
Thanks in advance!