I wanted to point out, since this is one of the top Google hits for this topic, that Latent Dirichlet Allocation (LDA), Hierarchical Dirichlet Processes (HDP), *and* hierarchical Latent Dirichlet Allocation (hLDA) are all distinct models.

LDA models documents as dirichlet mixtures of a fixed number of topics- chosen as a parameter of the model by the user- which are in turn dirichlet mixtures of words. This generates a flat, soft probabilistic clustering of terms into topics and documents into topics.

HDP models topics as mixtures of words, much like LDA, but rather than documents being mixtures of a fixed number of topics, the number of topics is generated by a dirichlet process, resulting in the number of topics being a random variable as well. The "hierarchical" portion of the name refers to another level being added to the generative model (the dirichlet process producing the number of topics), not the topics themselves- the topics are still flat clusterings.

hLDA, on the other hand, is an adaptation of LDA that models topics as mixtures of a new, distinct level of topics, drawn from dirichlet *distributions* and not processes. It still treats the number of topics as a hyperparameter, i.e., independent of the data. The difference is that the clustering is now hierarchical- it learns a clustering of the first set of topics themselves, giving a more general, abstract relationships between topics (and hence, words and documents). Think of it like clustering the stack exchanges into math, science, programming, history, etc. as opposed to clustering data science and cross validation into an abstract statistics and programming topic that shares some concepts with, say, software engineering, but the software engineering exchange is clustered on a more concrete level with the computer science exchange, and the similarity between all of the mentioned exchanges doesn't appear as much until the upper layer of clusters.

Is HDP supposed to be data-driven in regards to the number of topics it will select? On practical side, I tried to run Blei's HDP implementation and it just ate all memory until I killed the process. I have 16GB RAM and just over 100K short documents to analyze. – Vladislavs Dovgalecs – 2015-02-18T09:25:21.440