Is there a thumb-rule for designing neural-networks?



I know that a neural-network architecture is mostly based on the problem itself and the types of input/output, but still - there's always a "square one" when starting to build one. So my question is - given a input dataset of MxN (M is the number of records, N is the number of features) and a C possible output classes - is there a thumb-rule to how many layers/units should we start with?


Posted 2018-02-12T11:29:41.277

Reputation: 639

Possible answers to this question is very problem specific. There may be some useful rules for image object recognition, but these rules may not work on a different dataset. – horaceT – 2018-02-15T00:48:51.627



This question has been answered in detail on CrossValidated: How to choose the number of hidden layers and nodes in a feedforward neural network?

However, let me add my own two cents:

There is no magic rule for choosing the best neural network architecture, but if you can find an architecture someone has used to solve a similar problem this is often an excellent starting point.

The best places to look are official or unofficial examples using popular neural network libraries such as Keras, PyTorch, or Tensorflow, and architectures described in academic literature. keras/examples on github is a great resource.

These architectures were likely chosen after lots of trial and error, so most of the work will have been done for you.


Posted 2018-02-12T11:29:41.277

Reputation: 2 301

6One caveat to the CrossValidated answer is that it's now 7+ years old, and points to a 15+ year old FAQ for an "excellent summary" of configuring your hidden layers. To say that there's been a lot of work on NN configuration in the past 7-15 years is a bit of an understatement. There's an increasing number of applications which fall outside the "one hidden layer is sufficient" regime. -- That said, for a host of problems a deep learning approach may be overkill. Starting with a single hidden layer and only going deep if needed is a solid strategy. – R.M. – 2018-02-12T19:54:40.123

1Good points, R.M. - The second answer there is much more recent, however. – Imran – 2018-02-12T20:00:30.430

@Imran I think you never quite answer OP question. The choice of hidden nodes and architecture is a very deep question that's still not very well understood. Witness ResNet and wide ResNet with cross layer connections. – horaceT – 2018-02-15T00:45:26.387

Thanks for your comment, @horaceT. My attempted answer was meant to mean "There is no rule of thumb, but there are heuristics that can be applied". I am aware of Res Nets. Please let me know how else I can improve my answer. – Imran – 2018-02-15T04:41:13.177


I read a paper exploring the idea of using neural networks to design other neural networks, by exploring which configuration of nodes and layers was the most efficient.
Here's the page where you can download a PDF.

Daniel Ephrat

Posted 2018-02-12T11:29:41.277

Reputation: 31


Following @Imran's answer, I found this paper in one of in the comments of the CrossValidated post he linked to. Besides an attempt to find the right architecture using Genetic Models (instead of using a rule-of-thumb), section 2.1 gives some theoretical bounds to how many hidden units should be in a one/two-hidden-layers system.

EDIT: I've tested this theorem, and found out that using Genetic Models is just as good as selecting a random architecture.


Posted 2018-02-12T11:29:41.277

Reputation: 639