3

What rectifier is better in general case of Convolutional Neural Network and how about empirical rules to use each type?

- ReLU
- PReLU
- RReLU
- ELU
- Leacky ReLU

3

What rectifier is better in general case of Convolutional Neural Network and how about empirical rules to use each type?

- ReLU
- PReLU
- RReLU
- ELU
- Leacky ReLU

4

I've read all the papers about PReLU, LeakyReLU (...) and all the claims how it improves this and that but the little dirty secret is: most of the time it doesn't matter at all and you can't go much wrong with ReLU - empirically proven. I've personally tried all of them in many different problems (from training small networks from scratch through changing activations in large pretrained models) My guess is that gradient doesn't die much in any of them and the rest is pretty much irrelevant.

1

FYI Comprehensive list of activation functions in neural networks with pros/cons

– Franck Dernoncourt – 2016-11-09T14:42:58.450