29

11

When you're writing your algorithm, how do you know how many neurons you need per single layer? Are there any methods for finding the optimal number of them, or is it a rule of thumb?

29

11

When you're writing your algorithm, how do you know how many neurons you need per single layer? Are there any methods for finding the optimal number of them, or is it a rule of thumb?

16

There is no direct way to find the optimal number of them: people empirically try and see (e.g., using cross-validation). The most common search techniques are random, manual, and grid searches.

There exist more advanced techniques such as Gaussian processes, e.g. *Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification, IEEE SLT 2016*.

7

For a more intelligent approach than random or exhaustive searches, you could try a genetic algorithm such as NEAT http://nn.cs.utexas.edu/?neat. However, this has no guarantee to find a global optima, it is simply an optimization algorithm based on performance and is therefore vulnerable to getting stuck in a local optima.

4

Paper Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[J]. arXiv preprint arXiv:1512.00567, 2015. gives some general design principles:

Avoid representational bottlenecks, especially early in the network;

Balance the width and depth of the network. Optimal performance of the network can be reached by balancing the number of filters per stage and the depth of the network. Increasing both the width and the depth of the network can contribute to higher quality networks. However, the optimal improvement for a constant amount of computation can be reached if both are increased in parallel. The computational budget should therefore be distributed in a balanced way between the depth and width of the network.

These suggestions can't bring you the optimal number of neurons in a network though.

However, there are still some model compression research e.g. Structured Sparsity Learning (SSL) of Deep Neural Networks, SqueezeNet, Pruning network that may shed some light on how to optimizing the neurons per single layer.

Especially in Structured Sparsity Learning of Deep Neural Networks, it adds a `Group Lasso`

regularization term in the loss function to to regularize the structures(i.e., filters, channels, filter shapes, and layer depth) of DNNs, which namely is to zero out some components(i.e., filters, channels, filter shapes, and layer depth) of the net structure and achieves a remarkable compact and acceleration of the network, while keeps a small classification accuracy loss.

3

You know when you have too many neurons is when you get over fitting. Meaning that it is not working good because NN is trying to activate on the most perfect match that is impossible. Like two different cats with the same amount of atoms, or to say, it is a detector NN that only activates on a picture of you pet cat and nothing else. You want a wider range for the nn to activate. Like on any picture of cat.

Overfitting is a problem that has no real quick fix. You can start with too few and then keep adding more. Or start out with a lot and then removing them until it works right.

1Will get pretty close to a global optimum, anyway. – jjmerelo – 2016-08-11T15:17:20.397