Is there any domain where Bayesian Networks outperform neural networks?



Neural networks get top results in Computer Vision tasks (see MNIST, ILSVRC, Kaggle Galaxy Challenge). They seem to outperform every other approach in Computer Vision. But there are also other tasks:

I'm not too sure about ASR (automatic speech recognition) and machine translation, but I think I've also heard that (recurrent) neural networks (start to) outperform other approaches.

I am currently learning about Bayesian Networks and I wonder in which cases those models are usually applied. So my question is:

Is there any challenge / (Kaggle) competition, where the state of the art are Bayesian Networks or at least very similar models?

(Side note: I've also seen decision trees, 2, 3, 4, 5, 6, 7 win in several recent Kaggle challenges)

Martin Thoma

Posted 2016-01-17T13:04:57.100

Reputation: 15 590

2So the answer to your question is then, No. Right? Because all answers seem to point the advantages of Bayesian Networks over other predictive models, but I have not seen any Kaggle competition where they actually outperform other models. Can anyone provide one? Because all the reasons and possible advantages, e.g. the lack of enough data and choosing good priors, given in the answers seem great in theory, but still do not answer the question by providing, at least, one example. – MNLR – 2018-04-26T08:57:47.943

One thing that it Bayesian networks can be useful for unsupervised learning/tasks where the amount of data is relatively limited. Neural networks only outperform others when there is massive amount of data to be trained on. – xji – 2018-07-04T19:12:01.520

It's not a question of domain. It's a question of how much data you have, how good your priors are, and whether you want posteriors. – Emre – 2016-01-17T18:22:27.547

1@Emre Which is a question of domain... (and, of course, of money when you have the possibility to not only use existing datasets but can also hire people to create / label new data). – Martin Thoma – 2016-01-17T18:26:05.697

It would be a question of domain if there were some property of the data, some structure, that one algorithm took advantage of better than the other, but that is not what I am suggesting. – Emre – 2016-01-17T19:16:20.560



One of the areas where Bayesian approaches are often used, is where one needs interpretability of the prediction system. You don't want to give doctors a Neural net and say that it's 95% accurate. You rather want to explain the assumptions your method makes, as well as the decision process the method uses.

Similar area is when you have a strong prior domain knowledge and want to use it in the system.


Posted 2016-01-17T13:04:57.100

Reputation: 466


See also: The Mythos of Model Interpretability

– Martin Thoma – 2017-01-05T09:14:04.683

See also: lime

– Martin Thoma – 2018-07-04T19:25:56.167


Bayesian networks and neural networks are not exclusive of each other. In fact, Bayesian networks are just another term for "directed graphical model". They can be very useful in designing objective functions neural networks. Yann Lecun has pointed this out here:

One example.

The variational auto encoder and derivatives are directed graphical models of the form $$p(x) = \int_z p(x|z)p(z) dz.$$ A neural networks is used to implemented $p(x|z)$ and an approximation to its inverse: $q(z|x) \approx p(z|x)$.


Posted 2016-01-17T13:04:57.100

Reputation: 311

Can the two parts be trained jointly? – nn0p – 2016-07-28T14:50:02.307

Yes, that's what's typically done. – bayer – 2019-11-23T22:11:31.877

@bayer Not sure who to ask - do Bayesian networks encompass node PDFs in a way that neural networks cannot (or have e.g. assume Gaussians/delta-functions)? – jtlz2 – 2020-01-06T11:10:04.083

@bayer PS Link is dead – jtlz2 – 2020-01-06T11:11:11.550

Neural networks are just a computational architecture, they can represent anything. The link is dead because Google shutdown Plus. I don't have the quote anymore, sorry. – bayer – 2020-01-13T07:04:26.157


Sometimes you care as much about changing the outcome as predicting the outcome.

A neural network given enough training data will tend to predict the outcome better, but once you can predict the outcome, you then may wish to predict the effect of making changes in the input features on the outcome.

An example from real life, knowing that someone is likely to have a heart attack is useful, but being able to tell the person that if they stopped doing XX, the risk would reduce by 30% is of much greater benefit.

Likewise for customer retention, knowing why customers stop shopping with you, is worth as much as predicting the customers that are likely to stop shopping with you.

Also a simpler Bayesian Network that predicts less well but leads to more action being taken may often be better than a more “correct” Bayesian Network.

The biggest advantage of Bayesian networks over neural networks is that they can be used for causal inference . This branch is of fundamental importance to statistics and machine learning and Judea Pearl has won the Turing award for this research.

Ian Ringrose

Posted 2016-01-17T13:04:57.100

Reputation: 333

But neural networks can also be used to determine the role and importance of different features, right? – Hossein – 2019-08-31T12:50:43.113

Importance, sure, role, not really. A non-Bayesian NN essentially learns correlations between features and the output values. This is not sufficient to prove there's a direct causal link. – jkm – 2020-02-29T22:56:31.007


Excellent answers already.

One domain which I can think of, and is working extensively in, is the customer analytics domain.

I have to understand and predict the moves and motives of the customers in order to inform and warn both the customer support, the marketing and also the growth teams.

So here, neural networks do a really good job in churn prediction, etc. But, I found and prefer the Bayesian networks style, and here are the reasons for preferring it:

  1. Customers always have a pattern. They always have a reason to act. And that reason would be something which my team has done for them, or they have learnt themselves. So, everything has a prior here, and in fact that reason is very important as it fuels most of the decision taken by the customer.
  2. Every move by the customer and the growth teams in the marketing/sales funnel is cause-effect. So, prior knowledge is vital when it comes to converting a prospective lead into a customer.

So, the concept of prior is very important when it comes to customer analytics, which makes the concept of Bayesian networks very important to this domain.

Suggested Learning:

Bayesian Methods for Neural Networks

Bayesian networks in business analytics


Posted 2016-01-17T13:04:57.100

Reputation: 7 606


Bayesian networks might outperform Neural Networks in small data setting. If the prior information is properly managed via the network structure, priors and other hyperparameters, it might have an edge over Neural Networks. Neural Networks, especially the ones with more layers, are very well known to be data hungry. Almost by definition lots of data is necessary to properly train them.

Vladislavs Dovgalecs

Posted 2016-01-17T13:04:57.100

Reputation: 471


I've posted this link on Reddit and got a lot of feedback. Some have posted their answers here, others didn't. This answer should sum the reddit post up. (I made it community wiki, so that I don't get points for it)

Martin Thoma

Posted 2016-01-17T13:04:57.100

Reputation: 15 590


I did a small example for this once. From that, I think Bayesian Networks are preferred if you want to capture a distribution but your input training set doesn't cover the distribution well. In such cases, even a neural network that generalised well would not be able to reconstruct the distribution.

Leela Prabhu

Posted 2016-01-17T13:04:57.100

Reputation: 163


Bayesian networks are preferred for genome interpretation. See, for example, this dissertation discussing computational methods for genome interpretation.

Nathaniel Hendrix

Posted 2016-01-17T13:04:57.100

Reputation: 21

2Why are they preferred? – Ian Ringrose – 2016-01-18T14:13:44.307