Good "frequent sequence mining" packages in Python?

17

9

Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!

Edamame

Posted 2016-11-08T12:33:03.713

Reputation: 2 045

For that purpose, I don't think implementations in python or in R are going to help at all. Patterning libraties in R and Python do the least work only for frequent patterns but you you want to find some specific patterns other than frequent one, they are not going to be any help at all. – StoryMay – 2021-02-03T02:52:13.137

Answers

7

I am actively maintaining an efficient implementation of both PrefixSpan and BIDE in Python 3, supporting mining both frequent and top-k (closed) sequential patterns.

Chuancong Gao

Posted 2016-11-08T12:33:03.713

Reputation: 171

I'd like to implement those in javascript, but I don't fully understand how these algorithms work. Can you explain it in plain English? – inf3rno – 2018-05-25T10:51:32.383

I suggest you check my original minimal implementation of PrefixSpan. Its core part takes only 15 lines. https://gist.github.com/chuanconggao/4df9c1b06fa7f3ed854d5d96e2ae499f

– Chuancong Gao – 2018-05-28T05:18:01.640

Thanks! I'll try to translate it to js, but won't be easy. :-) Afaik PrefixSpan is building projected databases based on where the prefix matches. I am currently reading about BIDE, which is theory is an even better algorithm. – inf3rno – 2018-05-28T10:45:15.430

There are too many differences between js and python collections. I did not manage to reproduce the code in js. I'll try it again later. – inf3rno – 2018-05-28T11:39:23.090

Not sure whether it helps, but I have another Scala version of PrefixSpan. https://github.com/chuanconggao/PrefixSpan-scala However, I highly suggest you fully understand the algorithm before implement.

– Chuancong Gao – 2018-05-28T11:51:51.257

I just wanted to solve it in a copy paste manner, but you are right, I need to fully understand the code first to be able to translate it. I read about prefixspan a few days ago, it is not that hard, I already know the theory behind it. It's just that js does not have range and defaultdict, so I have to solve this with maps and arrays only. The iteration is different too especially with ES5. Thanks for the help! – inf3rno – 2018-05-28T13:43:20.720

6

The only Python package I've found is on Github.

They have an implementation of BIDE there, but it's not maintained code.

yossico

Posted 2016-11-08T12:33:03.713

Reputation: 201

Just to clarify, it did not implement BIDE which mines frequent closed sequences. It actually implemented PrefixSpan which mines all frequent sequences. PrefixSpan and BIDE share the same pattern enumeration framework, and that is why the authors cited the BIDE paper. – Chuancong Gao – 2018-04-20T21:44:32.100

What I did in the end is used: http://www.philippe-fournier-viger.com/spmf/ - It's a JAVA lib but I've wrapped it with python to match my needs

– yossico – 2018-06-26T19:14:49.683

2

Have you considered to write it by yourself? Because there is probably no up-to-date maintained library right now.

Check this out, its the basic - PrefixSpan and Closed/Maximal patterns are actually not that hard to implement.

HonzaB

Posted 2016-11-08T12:33:03.713

Reputation: 1 521

1

I've used fim's fpgrowth function in the past and it worked well. It's kind of a pain to install on Windows machines however. It seems to be an academic website so I'm not sure if they're doing many updates to the code over time...

Jed

Posted 2016-11-08T12:33:03.713

Reputation: 111

1

SPMF sounds like a useful library for pattern mining.

Samaneh Saadat

Posted 2016-11-08T12:33:03.713

Reputation: 11

0

Since none of the existing solutions were satisfactory for me, I created my own Python Wrapper for SPMF (the Java library mentioned in other answers here).

Lorenz Leitner

Posted 2016-11-08T12:33:03.713

Reputation: 101