13

4

As far as I know the development of algorithms to solve the Frequent Pattern Mining (FPM) problem, the road of improvements have some main checkpoints. Firstly, the Apriori algorithm was proposed in 1993, by Agrawal et al., along with the formalization of the problem. The algorithm was able to *strip-off* some sets from the `2^n - 1`

sets (powerset) by using a lattice to maintain the data. A drawback of the approach was the need to re-read the database to compute the frequency of each set expanded.

Later, on year 1997, Zaki et al. proposed the algorithm Eclat, which *inserted* the resulting frequency of each set inside the lattice. This was done by adding, at each node of the lattice, the set of transaction-ids that had the items from root to the referred node. The main contribution is that one does not have to re-read the entire dataset to know the frequency of each set, but the memory required to keep such data structure built may exceed the size of the dataset itself.

In 2000, Han et al. proposed an algorithm named FPGrowth, along with a prefix-tree data structure named FPTree. The algorithm was able to provide significant data compression, while also granting that only frequent itemsets would be yielded (without candidate itemset generation). This was done mainly by sorting the items of each transaction in decreasing order, so that the most frequent items are the ones with the least repetitions in the tree data structure. Since the frequency only descends while traversing the tree in-depth, the algorithm is able to *strip-off* non-frequent itemsets.

**Edit**:

~~As far as I know, this may be considered a state-of-the-art algorithm, but I'd like to know about other proposed solutions. What other algorithms for FPM are considered "state-of-the-art"? What is the ~~*intuition*/*main-contribution* of such algorithms?

Is the FPGrowth algorithm still considered "state of the art" in frequent pattern mining? If not, what algorithm(s) may extract frequent itemsets from large datasets more efficiently?

1This post was researched and presented well. It makes a poor question for an SE network site but it would be a great topic to start on a discussion forum. – Air – 2014-07-12T19:38:44.743

@AirThomas Thanks for the warning. I tried to save the post by making a proper question out of it. – Rubens – 2014-07-13T03:06:30.877