How to set the number of neurons and layers in neural networks



I am a beginner to neural networks and have had trouble grasping two concepts:

  1. How does one decide the number of middle layers a given neural network have? 1 vs. 10 or whatever.
  2. How does one decide the number of neurons in each middle layer? Is it recommended having an equal number of neurons in each middle layer or does it vary with the application?


Posted 2018-01-13T15:26:31.233

Reputation: 503



The consideration of the number of neurons for each layer and number of layers in fully connected networks depends on the feature space of the problem. For illustrating what happens in the two dimensional cases in order to depict, I use 2-d space. I have used images from the works of a scientist. For understanding other nets like CNN I recommend you taking a look at here.

Suppose you have just a single neuron, in this case after learning the parameters of the network you will have a linear decision boundary which can separate the space to two individual classes.

enter image description here

enter image description here

Suppose that you are asked to separate the following data. You will need d1 which specifies the upper decision boundary and somehow is doing AND operation to determine whether the input data is on the left side of it or on the right side. Line d2 is doing another AND operation which investigates whether the input data is upper than d2 or not. In this case d1 is trying to understand whether the input is on the left side of line to classify the input as circle, also d2 is trying to figure out whether the input is on the right side of the line to classify the input as circle. Now we need another AND operation to wrap up the results of the two lines which are constructed after training their parameters. If the input is on the left side of d1 and on the right side of d2, it should be classified as circle.

enter image description here

Now suppose that you have the following problem and you are asked to separate the classes. In this case the justification is exactly like the above's.

enter image description here

For the following data:

enter image description here

the decision boundary is not convex and is more complex than the previous boundaries. First you have to have a sub-net which finds the inner circles. Then you have to have another sub-net which finds the inner rectangular decision boundary which decides the inputs which are inside of the rectangle are not circle and if they are outside, they are circle. After these, you have to wrap up the results and say if the input data is inside the bigger rectangle and outside of the inner rectangle, it should be classified as circle. You need another AND operation for this purpose. The network would be like this:

enter image description here

Suppose that you are asked to find the following circled decision boundary.

enter image description here

In this case your network would be like the following network which was referred to but with much more neurons in the first hidden layer.

enter image description here


Posted 2018-01-13T15:26:31.233

Reputation: 12 077


Very good question, as there doesn't exist an exact answer to this question yet. This is an active field of research.

Ultimately, the architecture of your network is related to the dimensionality of your data. Since neural networks are universal approximators, as long as your network is big enough, it has the ability to fit your data.

The only way to truly know which architecture works best is to try all of them, and then pick the best one. But of course, with neural networks, it is quite difficult as each model takes quite some time to train. What some people do is first train a model that "too big" on purpose, and then prune it by removing weights that do not contribute much to the network.

What if my network is "too big"

If your network is too big, it might either overfit or struggle to converge. Intuitively, what happens is that your network is trying to explain your data in a more complicated way than it should. It's like trying to answer a question that could be answered with one sentence with a 10-page essay. It might be hard to structure such a long answer, and there may be a lot of unnecessary facts thrown in. (see this question)

What if my network is "too small"

On the other hand, if your network is too small, it will underfit your data and therefore. It would be like answering with one sentence when you should have written a 10-page essay. As good as your answer might be, you will be missing some of the relevant facts.

Estimating the size of the network

If you know the dimensionality of your data, you can tell whether your network is big enough. To estimate the dimensionality of your data, you could try computing its rank. This is a core idea in how people are trying to estimate the size of networks.

However, it is not as simple. Indeed, if your network needs to be 64-dimensional, do you build a single hidden layer of size 64 or two layers of size 8? Here, I am going to give you some intuition as to what would happen in either case.

Going deeper

Going deep means adding more hidden layers. What it does is that it allows the network to compute more complex features. In Convolutional Neural Networks, for instance, it has been shown often that the first few layers represent "low-level" features such as edges, and the last layers represent "high-level" features such as faces, body parts etc.

You typically need to go deep if your data is very unstructured (like an image) and needs to be processed quite a bit before useful information can be extracted from it.

Going wider

Going deeper means creating more complex features, and going "wider" simply means creating more of these features. It might be that your problem can be explained by very simple features but there needs to be many of them. Usually, layers are becoming narrower towards the end of the network for the simple reason that complex features carry more information than simple ones, and therefore you don't need as many.

Valentin Calomme

Posted 2018-01-13T15:26:31.233

Reputation: 4 666

You can use the concept of intrinsic dimension to find out the number of relevant dimensions for your problem. Intrinsic dimensions tries to answer how many variables are needed to fully describe a signal and it is related to the number of variables in the random source of that signal.

– Pedro Henrique Monforte – 2019-04-07T04:17:30.853


Short Answer: It is very related to the dimensions of your data and the type of the application.

Choosing the right number of layers can only be achievable with practice. There is no general answer to this question yet. By choosing a network architecture, you constrain your space of possibilities (hypothesis space) to a specific series of tensor operations, mapping input data to output data. In a DeepNN each layer can only access information present in the output of the previous layer. If one layer drops some information relevant to the problem at hand, this information can never be recovered by later layers. This is usually referred as "Information Bottleneck".

Information Bottleneck is a double-edged sword:

1) If you use a few number of layers/neurones, then the model will just learn a few useful representations/features of your data and lose some important ones, because the capacity of middle layers are very limited (underfitting).

2) If you use a big number of layers/neurones, then the model will learn too much representations/features that are specific to the training data and don’t generalize to data in the real-world and outside of your training set (overfitting).

Useful links for examples and more finding:




Posted 2018-01-13T15:26:31.233

Reputation: 1 055


Working with neural networks since two years ago, this is a problem I always have each time I wan't to model a new system. The best approach I've found is the following:

  1. Look for similar problems that have also been modeled with feed-forward networks and study their architectures.
  2. Begin with that configuration, train the data set and evaluate the test set.
  3. Perform pruning in your architecture and compare results in the data set with the previous results. If the accuracy of your model is not affected so you can infer that the original model is overfitting the data.
  4. Otherwise, try adding more degrees of freedom (i.e. more layers).

The general approach is to try different architectures, compare results and take the best configuration. Experience gives you more intuition in the first architecture guess.

Federico Caccia

Posted 2018-01-13T15:26:31.233

Reputation: 660


Adding to the previous answers, there are approaches where the topology of the neural network emerges endogenously, as part of the training. Most prominently, you have Neuroevolution of Augmenting Topologies (NEAT) where you start with a basic network without hidden layers and then use a genetic algorithm to "complexify" the network structure. NEAT is implemented in many ML frameworks. Here is a pretty accessible article on an implementation to learn Mario: CrAIg: Using Neural Networks to learn Mario

Frederic Schneider

Posted 2018-01-13T15:26:31.233

Reputation: 131


1.) The optimal number of neurons in each layer depends on your function you try to approximate. For one function, there might be a perfect number of neurons in one layer. But for another fuction, this number might be different.

2.) According to the Universal approximation theorem, a neural network with only one hidden layer can approximate any function (under mild conditions), in the limit of increasing the number of neurons.

3.) In practice, a good strategy is to consider the number of neurons per layer as a hyperparameter. A recent study showed that optimizing these hyperparameters should not be done independently. Instead, one can perform exhaustive grid search on a small neural network, to find the best hyperparameters, and then scale the entire neural network (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks).

Graph4Me Consultant

Posted 2018-01-13T15:26:31.233

Reputation: 893