Latent Dirichlet Allocation vs Hierarchical Dirichlet Process



Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are both topic modeling processes. The major difference is LDA requires the specification of the number of topics, and HDP doesn't. Why is that so? And what are the differences, pros, and cons of both topic modelling methods?


Posted 2014-05-18T06:10:52.543

Reputation: 2 242

Is HDP supposed to be data-driven in regards to the number of topics it will select? On practical side, I tried to run Blei's HDP implementation and it just ate all memory until I killed the process. I have 16GB RAM and just over 100K short documents to analyze. – Vladislavs Dovgalecs – 2015-02-18T09:25:21.440



HDP is an extension of LDA, designed to address the case where the number of mixture components (the number of "topics" in document-modeling terms) is not known a priori. So that's the reason why there's a difference.

Using LDA for document modeling, one treats each "topic" as a distribution of words in some known vocabulary. For each document a mixture of topics is drawn from a Dirichlet distribution, and then each word in the document is an independent draw from that mixture (that is, selecting a topic and then using it to generate a word).

For HDP (applied to document modeling), one also uses a Dirichlet process to capture the uncertainty in the number of topics. So a common base distribution is selected which represents the countably-infinite set of possible topics for the corpus, and then the finite distribution of topics for each document is sampled from this base distribution.

As far as pros and cons, HDP has the advantage that the maximum number of topics can be unbounded and learned from the data rather than specified in advance. I suppose though it is more complicated to implement, and unnecessary in the case where a bounded number of topics is acceptable.

Tim Goodman

Posted 2014-05-18T06:10:52.543

Reputation: 2 862


Anecdotally, I've never been impressed with the output from hierarchical LDA. It just doesn't seem to find an optimal level of granularity for choosing the number of topics. I've gotten much better results by running a few iterations of regular LDA, manually inspecting the topics it produced, deciding whether to increase or decrease the number of topics, and continue iterating until I get the granularity I'm looking for.

Remember: hierarchical LDA can't read your mind... it doesn't know what you actually intend to use the topic modeling for. Just like with k-means clustering, you should choose the k that makes the most sense for your use case.

Charlie Greenbacker

Posted 2014-05-18T06:10:52.543

Reputation: 1 451


I wanted to point out, since this is one of the top Google hits for this topic, that Latent Dirichlet Allocation (LDA), Hierarchical Dirichlet Processes (HDP), and hierarchical Latent Dirichlet Allocation (hLDA) are all distinct models.

LDA models documents as dirichlet mixtures of a fixed number of topics- chosen as a parameter of the model by the user- which are in turn dirichlet mixtures of words. This generates a flat, soft probabilistic clustering of terms into topics and documents into topics.

HDP models topics as mixtures of words, much like LDA, but rather than documents being mixtures of a fixed number of topics, the number of topics is generated by a dirichlet process, resulting in the number of topics being a random variable as well. The "hierarchical" portion of the name refers to another level being added to the generative model (the dirichlet process producing the number of topics), not the topics themselves- the topics are still flat clusterings.

hLDA, on the other hand, is an adaptation of LDA that models topics as mixtures of a new, distinct level of topics, drawn from dirichlet distributions and not processes. It still treats the number of topics as a hyperparameter, i.e., independent of the data. The difference is that the clustering is now hierarchical- it learns a clustering of the first set of topics themselves, giving a more general, abstract relationships between topics (and hence, words and documents). Think of it like clustering the stack exchanges into math, science, programming, history, etc. as opposed to clustering data science and cross validation into an abstract statistics and programming topic that shares some concepts with, say, software engineering, but the software engineering exchange is clustered on a more concrete level with the computer science exchange, and the similarity between all of the mentioned exchanges doesn't appear as much until the upper layer of clusters.


Posted 2014-05-18T06:10:52.543

Reputation: 201


I have a situation where HDP works well compared to LDA. I have about 16000 documents that belong to various classes. As I am unaware of how many different topics I can gather for each class, HDP is really helpful in this case.

Nischal Hp

Posted 2014-05-18T06:10:52.543

Reputation: 755


Actually HDP require a lot of hidden parameters, which are in code. If you play with such parameters you will get different results (different topics). People usually does not pay attention to such hidden parameters and thinks that model able to find such parameters. It is not true. User have to define parameters ‘eta’ ‘gamma’ and ‘alpha’ and maximum of topics. If you specify max of topics say about 23 topics, then youк model provide 23 topics in output. If you set up 15 topics then you get 15 topics in output….

Sergei Koltsov

Posted 2014-05-18T06:10:52.543

Reputation: 1


Yee Whye Teh et al's 2005 paper Hierarchical Dirichlet Processes describes a nonparametric prior for grouped clustering problems. For example, the HDP helps in generalizing the Latent Dirichlet Allocation model to the case the number of topics in the data is discovered by the inference algorithm instead of being specified as a parameter of the model. Detailed explanation on Dirichlet Process can be found here

Topic models promise to help summarize and organize large archives of texts that cannot be easily analyzed by hand. The Hierarchical Dirichlet process (HDP) is a powerful mixed-membership model for the unsupervised analysis of grouped data. Unlike its finite counterpart, latent Dirichlet allocation, the HDP topic model infers the number of topics from the data.


Posted 2014-05-18T06:10:52.543

Reputation: 1 359