What are some well-known problems where neural networks don't do very well?



What are some well-known cases, problems or real-world applications where neural networks don't do very well?

Specification: I'm looking for specific regression tasks (with accessible data-sets) where neural networks are not the state-of-the-art. The regression task should be "naturally suitable", so no sequential or time-dependent data (in which case an RNN or reservoir computer would be more natural).


Posted 2020-03-11T17:27:32.623

Reputation: 445

1Can you explain what kind of answer you're exactly looking for and why the existing answers are not sufficient? Is it because you're looking for more specific applications where neural networks are not yet state-of-the-art? – nbro – 2020-03-22T02:34:54.710

Precisely, I'm looking for a regression task with no time-dependent/sequential component where neural nets are not state-of-the-art. A lot of the answers below are classification problems. – AIM_BLB – 2020-03-22T02:42:02.940


Originally, you were looking for general problems where neural networks don't do well. That's a question. Another question is the one you opened the bounty for. When you have a different specific question, which is the case now, you should create a new post (I think), so that people provide specific answers. For example, this new answer addresses your original question, but not your new specific question. Now that you opened the bounty, it's too late, so, hopefully, someone will provide a specific answer to the question you opened the bounty for.

– nbro – 2020-03-23T01:23:18.157

2So, please, do not change the original question, otherwise, you invalidate the current answers. It's ok to have that specification now, because a potential answer will also answer your original question, but, please, next time, bear in mind that, if you have a new different (although very similar) question, I think you should create a new post and ask there the new question. – nbro – 2020-03-23T01:31:53.110

Thanks for the tip nbro. I was looking for both types of answers but only had recieved the general type; but its my fault for not being clearer in my original formulation. – AIM_BLB – 2020-03-23T07:58:48.773



Here's a snippet from an article by Gary Marcus

In particular, they showed that standard deep learning nets often fall apart when confronted with common stimuli rotated in three dimensional space into unusual positions, like the top right corner of this figure, in which a schoolbus is mistaken for a snowplow:

enter image description here

. . .

Mistaking an overturned schoolbus is not just a mistake, it’s a revealing mistake: it that shows not only that deep learning systems can get confused, but they are challenged in making a fundamental distinction known to all philosophers: the distinction between features that are merely contingent associations (snow is often present when there are snowplows, but not necessary) and features that are inherent properties of the category itself (snowplows ought other things being equal have plows, unless eg they have been dismantled). We’d already seen similar examples with contrived stimuli, like Anish Athalye’s carefully designed, 3-d printed foam covered dimensional baseball that was mistaken for an espresso

enter image description here

Alcorn’s results — some from real photos from the natural world — should have pushed worry about this sort of anomaly to the top of the stack.

Please note that the opinions of the author are his alone and I do not necessarily share all of them with him.

Edit: Some more fun stuff

1) DeepMind's neural network that could play Breakout and Starcraft saw a dramatic dip in performance when the paddle was moved up by a few pixels.

See: General Game Playing With Schema Networks

While in the latter, it performed well with one race of the character but not on a different map and with different characters.



AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for elmo.

What the team at Deepmind did was to build a very good search algorithm. A search algorithm that includes the capability to remember facets of previous searches to apply better results to new searches. This is very clever; it undoubtedly has immense value in many areas, but it cannot be considered general intelligence.

See: AlphaZero: How Intuition Demolished Logic (Medium)

Anshuman Kumar

Posted 2020-03-11T17:27:32.623

Reputation: 404

Comments are not for extended discussion; this conversation has been moved to chat.

– nbro – 2020-03-15T23:52:47.977

Just an FYI: this link seems to have moved, or requires a login: https://stream.nyu.edu/channel/CS%2BFaculty%2BSearch%2B2019/111128891

– DukeZhou – 2020-05-03T23:47:27.757


In theory, most neural networks can approximate any continuous function on compact subsets of $\mathbb{R}^n$, provided that the activation functions satisfy certain mild conditions. This is known as the universal approximation theorem (UAT), but that should not be called universal, given that there are a lot more discontinuous functions than continuous ones, although certain discontinuous functions can be approximated by continuous ones. The UAT shows the theoretical powerfulness of neural networks and their purpose. They represent and approximate functions. If you want to know more about the details of the UAT, for different neural network architectures, see this answer.

However, in practice, neural networks trained with gradient descent and backpropagation face several issues and challenges, some of which are due to the training procedure and not just the architecture of the neural network or available data.

For example, it is well known that neural networks are prone to catastrophic forgetting (or interference), which means that they aren't particularly suited for incremental learning tasks, although some more sophisticated incremental learning algorithms based on neural networks have already been developed.

Neural networks can also be sensitive to their inputs, i.e. a small change in the inputs can drastically change the output (or answer) of the neural network. This is partially due to the fact that they learn a function that isn't really the function you expect them to learn. So, a system based on such a neural network can potentially be hacked or fooled, so they are probably not well suited for safety-critical applications. This issue is related to the low interpretability and explainability of neural networks, i.e. they are often denoted as black-box models.

Bayesian neural networks (BNNs) can potentially mitigate these problems, but they are unlikely to be the ultimate or complete solution. Bayesian neural networks maintain a distribution for each of the units (or neurons), rather than a point estimate. In principle, this can provide more uncertainty guarantees, but, in practice, this is not yet the case.

Furthermore, neural networks often require a lot of data in order to approximate the desired function accurately, so in cases where data is scarce neural networks may not be appropriate. Moreover, the training of neural networks (especially, deep architectures) also requires a lot of computational resources. Inference can also be sometimes problematic, when you need real-time predictions, as it can also be expensive.

To conclude, neural networks are just function approximators, i.e. they approximate a specific function (or set of functions, in the case of Bayesian neural networks), given a specific configuration of the parameters. They can't do more than that. They cannot magically do something that they have not been trained to do, and it is usually the case that you don't really know the specific function the neural network is representing (hence the expression black-box model), apart from knowing your training dataset, which can also contain spurious information, among other issues.


Posted 2020-03-11T17:27:32.623

Reputation: 19 783

Comments are not for extended discussion; this conversation has been moved to chat.

– nbro – 2020-03-15T23:52:29.990


In our deep learning lecture, we discussed the following example (from Unmasking Clever Hans predictors and assessing what machines really learn (2019) by Lapuschkin et al.).

enter image description here

Here the neural network learned a wrong way to identify a picture, i.E by identifying the wrong "relevant components". In the sensitivity maps next to the pictures, we can see that the watermark was used to identify if there is a horse present in the picture. If we remove the watermark, the classification is no longer made. Even more worryingly, if we add the tag to a completely different picture, it gets identified as a horse!

Viktor Glombik

Posted 2020-03-11T17:27:32.623

Reputation: 261


This is similar to the famous tank story.

– nbro – 2020-03-12T23:33:39.723


This is more in the direction of 'what kind of problems can be solved by neural networks'. In order to train a neural network you need a large set of training data which is labelled with correct/ incorrect for the question you are interested in. So for example 'identify all pictures that have a cat on them' is very suitable for neural networks. On the other hand 'summarize the story of this toddler picture book' is very hard. Although a human can easily decide whether a given summary is any good or not it would be very difficult to build a suitable set of training data for this kind of problem. So if you can't build a large training data set with correct answers, you can't train a neural network to solve the problem.

The answer of Anshuman Kumar is also an instance of that, also a potentially solvable one. The neural network that misidentified upside-down school buses presumably had very few if any upside-down school buses in its training data. Put them into the training data and the neural network will identify these as well. This is still a flaw in neural networks, a human can correctly identify an upside-down school bus the first time they see one if they know what school busses look like.


Posted 2020-03-11T17:27:32.623

Reputation: 151

Good point. Also using a feature map which rotates and stretches immages to create more "artificial" immage samples should bypass Anshuman's problem – AIM_BLB – 2020-03-12T12:47:06.687

@AIM_BLB you'd need a feature map that can do so in 3D. So you pretty much need the whole pipeline the brain does: edge detection -> depth map -> segmentation -> 3D feature map -> classifier search (with each stage feeding back to the edge detection (!) stage and all later ones). And that's without any depth cues, meaning that the depth map can only take input from the classifier and the segmentation map. – John Dvorak – 2020-03-13T03:19:12.980


I don't know if it might be of use, but many areas of NLP are still hard to tackle, and even if deep models achieve the state of the art results, they usually beat baseline shallow models by very few percentage points. One example that I've had the opportunity to work on is stance classification 1. In many datasets, the best F score achievable is around 70%.

Even though it's hard to compare results since in NLP many datasets are really small and domain-specific (especially for stance detection and similar SemEval tasks), many times SVM, conditional random fields, sometimes even Naive Bayes models are able to perform almost as good as CNN or RNN. Other tasks for which this holds are argumentation mining or claim detection.

See e.g. the paper TakeLab at SemEval-2016 Task 6: Stance Classification in Tweets Using a Genetic Algorithm Based Ensemble (2016) by Martin Tutek et al.

Edoardo Guerriero

Posted 2020-03-11T17:27:32.623

Reputation: 1 098

Thanks for this answer. I'm interested in this subject from the standpoint of (satisficing)[https://en.wikipedia.org/wiki/Satisficing]! (i.e. if time or space is sufficiently restricted, the "less optimal" method may prevail.) – DukeZhou – 2020-05-03T23:42:27.703


A checkerboard with missing squares is impossible for a neural network to learn the missing color. The more it learns on training data, the worse it does on test data.

See e.g. this article The Unlearnable Checkerboard Pattern (which, unfortunately, is not freely accessible). In any case, it should be easy to try out yourself that this task is difficult.


Posted 2020-03-11T17:27:32.623

Reputation: 249


Neural networks seem to have a great deal of difficulty handling adversarial input, i.e., inputs with certain changes (often imperceptible or nearly imperceptible by humans) designed by an attacker to fool them.

This is not the same thing as just being highly sensitive to certain changes in inputs. Robustness against wrong answers in that case can be increased by reducing the probability of such inputs. (If only one in 10^15 possible images causes a problem, it's not much of a problem.) However, in the adversarial case reducing the space of problematic images doesn't reduce the probability of getting one because the images are specifically selected by the attacker.

One of the more famous papers in this area is "Synthesizing Robust Adversarial Examples", which produced not only examples where a few modified pixels or other invisible-to-humans modifications to a picture fooled a neural network-based image classifier, but also perhaps the first examples of 3D objects designed to fool similar classifiers and successfully doing so (from every angle!).

(Those familiar with IT securitity will no doubt recognise this as a familiar asymmetry: roughly, a defender must defend against all attacks launched against a system, but an attacker need find only one working attack.)

In "A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance", Adi Shamir et al. propose a mathematical framework for analyzing the problem based on Hamming distances that, while currently a less practical attack than the MIT/Lab6 one, has some pretty disturbing theoretical implications, including that current approaches to preventing these attacks may be, in the end, ineffective. For example, he points out that blurring and similar techniques that have been used to try to defend against adversarial attacks can be treated mathematically as simply another layer added on top of the existing neural network, requiring no changes to the attack strategy.

(I attended a talk by Shamir a few months ago that was much easier going than the paper, but unfortunately I can't find a video of that or a similar talk on-line; if anybody knows of one please feel free to edit this answer to add a link!)

There's obviously still an enormous amount of research to be done in this area, but it seems possible that neural networks alone are not capable of defense against this class of attack, and other techniques will have to be employed in addition to make neural networks robust against it.


Posted 2020-03-11T17:27:32.623

Reputation: 131

The attacks aren't quite the same if you apply random blurring. In principle adding random noise should provide sufficient defense, if your network on average handles noise well. I have seen attempts to use lipschitz like constraints but for deep networks the bounds are useless. It's kind of circular in a way - we want the network to function like humans do but there isn't a clear norm for how humans judge similarity between images. – FourierFlux – 2020-05-03T23:53:48.507


Large scale route optimization problems.

The is progress made in using Deep Reinforcement learning to solve vehicle routing problems (VRP), for example in this paper: https://arxiv.org/abs/1802.04240v2.

However, for large scale problems and overall heuristic methods, like the ones provided by Google OR tools are much easier to use.


Posted 2020-03-11T17:27:32.623

Reputation: 166

You don't talk about neural networks in your answer, but only about deep RL. Maybe you can clarify how neural networks are related to your linked paper and the "large scale route optimization problems". – nbro – 2020-03-14T01:32:44.357


From my experience in industry, a lot of data science (operating on customer information, stored in a database) is still dominated by decision trees and even SVMs. Although neural networks have seen incredible performance on "unstructured" data, like images and text, there still do not appear to be great results extending to structured, tabular data (yet).

At my old company (loyalty marketing with 10 million+ members) there was a saying, "You can try any model you like, but you must try XGBoost". And let's just say that I did try comparing it to a neural network, and ultimately I did go with XGBoost ;)


Posted 2020-03-11T17:27:32.623

Reputation: 281

This sounds very promising. So something like Boston housing would be a toy example data-set of this type? – AIM_BLB – 2020-03-23T08:25:51.397


Yes exactly! Same with the classic Adult dataset: https://archive.ics.uci.edu/ml/datasets/adult

– information_interchange – 2020-03-23T15:24:32.003

So in that case XGBoost is the best? – AIM_BLB – 2020-03-23T15:32:12.060

1Well it's usually difficult to say in advance, and especially before you do tinkering like hyperparameter sweeps, regularization etc. But in general, XGBoost gives you good results right out of the box, with the added bonus that it is also somewhat interpretable (such as giving you feature importances), allowing you to explain things in a business setting nicely – information_interchange – 2020-03-23T17:36:51.713


My 50cents: NP_(complexity) - is still hard to solve, even with NeuralNets.

In computational complexity theory, NP (nondeterministic polynomial time) is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine.

The easiest example, to imagine what is speech about, it is cryptography's Integer_factorization, which is basement of RSA cryptosystem.

For example, we have two simple numbers:

  • 12123123123123123123123.....45456
  • 23412421341234124124124.....11112

NeuralNetwork shall answer us exactly digit to digit both this numbers, when we will show it only multiplication of this two numbers... This is not guessing about school bus. The field of numbers much more bigger than number of words in all languages on whole Earth. Imagine, that there are billions of billion different school buses, billions of billions of different fire-hydrants and billions of such classes, and NN shall answer exactly - what is on the picture - no way. The chance to guess is so little...


Posted 2020-03-11T17:27:32.623

Reputation: 21


In the case of convolutional neural networks, the features may be extracted but without taking into account their relative positions (see the concept of translation invariance)

For example, you could have two eyes, a nose and a mouth be in different locations in an image and still have the image be classified as a face.

Operations like max-pooling may also have a negative impact on retaining position information.


Posted 2020-03-11T17:27:32.623

Reputation: 554

1I have edited your answer to make it clearer. Please, can you provide a research work that proves that the extraction of features, in a CNN, can have a negative impact such that a collection of two eyes, a nose and a mouth, although not in the appropriate relative position, is classified as a face? – nbro – 2020-03-16T17:24:01.420