How to decide neural network architecture?



I was wondering how do we have to decide how many nodes in hidden layers, and how many hidden layers to put when we build a neural network architecture.

I understand the input and output layer depends on the training set that we have but how do we decide the hidden layer and the overall architecture in general?


Posted 2017-07-06T19:05:44.447

Reputation: 355

Typically we experiment, using our intution; consider it a hyperparameter. There are ways of learning the architecture but I don't know how practical they are:

– Emre – 2017-07-06T19:12:20.417

2I looked for a duplicate to this, because I am sure it has cropped up many times before on this site. However, could not find a pure version that wasn't attached to some dataset or problem. Maybe this could be the generic question we point others to? Sadly there isn't a great "how to" answer to be had in general, but it's a common question when faced with so much choice. – Neil Slater – 2017-07-06T19:23:04.303

This is a very interesting question to answer (Researcher started working on your question). What would be the optimal architecture for dataset A and dataset B. Please read below paper that tried to answer to your question. Welcome the world of Neural Architecture Search (NAS).

– iDeepVision – 2019-03-17T00:28:47.373



Sadly there is no generic way to determine a priori the best number of neurons and number of layers for a neural network, given just a problem description. There isn't even much guidance to be had determining good values to try as a starting point.

The most common approach seems to be to start with a rough guess based on prior experience about networks used on similar problems. This could be your own experience, or second/third-hand experience you have picked up from a training course, blog or research paper. Then try some variations, and check the performance carefully before picking a best one.

The size and depth of neural networks interact with other hyper-paramaters too, so that changing one thing elsewhere can affect where the best values are. So it is not possible to isolate a "best" size and depth for a network then continue to tune other parameters in isolation. For instance, if you have a very deep network, it may work efficiently with the ReLU activation function, but not so well with sigmoid - if you found the best size/shape of network and then tried an experiment with varying activation functions you may come to the wrong conclusion about what works best.

You may sometimes read about "rules of thumb" that researchers use when starting a neural network design from scratch. These things might work for your problems or not, but they at least have the advantage of making a start on the problem. The variations I have seen are:

  • Create a network with hidden layers similar size order to the input, and all the same size, on the grounds that there is no particular reason to vary the size (unless you are creating an autoencoder perhaps).

  • Start simple and build up complexity to see what improves a simple network.

  • Try varying depths of network if you expect the output to be explained well by the input data, but with a complex relationship (as opposed to just inherently noisy).

  • Try adding some dropout, it's the closest thing neural networks have to magic fairy dust that makes everything better (caveat: adding dropout may improve generalisation, but may also increase required layer sizes and training times).

If you read these or anything like them in any text, then take them with a pinch of salt. However, at worst they help you get past the blank page effect, and write some kind of network, and get you to start the testing and refinement process.

As an aside, try not to get too lost in tuning a neural network when some other approach might be better and save you lots of time. Do consider and use other machine learning and data science approaches. Explore the data, maybe make some plots. Try some simple linear approaches first to get benchmarks to beat, linear regression, logistic regression or softmax regression depending on your problem. Consider using a different ML algorithm to NNs - decision tree based approaches such as XGBoost can be faster and more effective than deep learning on many problems.

Neil Slater

Posted 2017-07-06T19:05:44.447

Reputation: 24 613

It's a great explanation. Thanks. I also wonder if there is a good way to decide which ML approach to use? You mentioned that there might be a better way than a neural network, but how do we determine that easily? – user7677413 – 2017-07-07T07:05:03.400

@user7677413: The same thing applies. You have to try and see, although experience may give you a guide on familiar problems. – Neil Slater – 2017-07-07T07:07:13.257

so is machine learning basically like guessing with intuition and experience, rather than theoretical approach? – user7677413 – 2017-07-07T07:12:05.533

@user7677413: Well, not as a whole. There is plenty of theory to describe how the models work (or what their limits are), and theory may extend to idealised descriptions of data sets. But choosing between all the possible constructs you could use when faced with a real world problem description is usually an empirical science. You can theory-craft about what would work before you start, and people do, but the empirical side is far more common and effective in general. – Neil Slater – 2017-07-07T07:19:15.430

1when is neural network necessary then? – user7677413 – 2017-07-07T07:25:48.797

1Neural networks are rarely necessary. However, they are better at some problems. They excel at signal processing tasks such as audio and image recognition, and also have capacity to learn subtle differences from large amounts of data where simpler algorithms may reach a limit. However, whether a NN is the right tool for you and whatever problem you face on a particular day, no-one can predict. – Neil Slater – 2017-07-07T07:32:36.983

I see. So to get an intuition of which algorithm to use, do we just keep repeating examples and problems? Also, I have learned linear regression, logistic regression, and NN so far. Are these like the main ones that we should have in mind, or are there way more important algorithms than these? – user7677413 – 2017-07-07T07:35:39.527

I would like to add that it is often wiser to start experimenting with a simple architecture, as opposed to complex and big architecture. Once you get the small network to work, add complexity until you can't improve anymore. Also, before adding dropout, you might want the network to be able to overfit first. – tuomastik – 2017-07-07T08:11:32.303

@tuomastik: Thanks for that. I've added the start simple, because yes I have heard that a lot, but actually IME you don't need to be over-fitting for dropout to help. That's because dropout combines a regularisation effect with something analogous to bagging, so it can make improvements to metrics even when there was no over-fitting occurring. – Neil Slater – 2017-07-07T08:40:22.827

@user7677413: I think that is a different question. Perhaps research it and ask a new question here. Here's a random blog/guide I just found that explains a few types:

– Neil Slater – 2017-07-07T08:43:10.510

1@user7677413 I think you're making the assumption that there isn't 40 years of deep and insightful machine learning research. It sounds like you're just scratching the surface. I recommend finding a textbook and seeing how it all ties together, that would help build your intuition for the many machine learning algorithms. – Alex L – 2019-03-17T03:32:33.500