16

7

I am currently preparing for an exam on neural networks. In several protocols from former exams I read that the activation functions of neurons (in multilayer perceptrons) have to be monotonic.

I understand that activation functions should be differentiable, have a derivative which is not 0 on most points, and be non-linear. I do not understand why being monotonic is important/helpful.

I know the following activation functions and that they are monotonic:

- ReLU
- Sigmoid
- Tanh
- Softmax: I'm not sure if the definition of monotonicity is applicable for functions $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ with $n, m > 1$
- Softplus
- (Identity)

However, I still can't see any reason why for example $\varphi(x) = x^2$.

Why do activation functions have to be monotonic?

(Related side question: is there any reason why the logarithm/exponential function is not used as an activation function?)

2@MartinThoma Are you sure softmax is monotonic? – Media – 2018-02-21T07:07:19.670

2Thanks @Media. To answer your question: I'm not sure what "monotonic" even means for functions in $f:R^n \rightarrow R^m$ with $m > 1$. For $m=1$ softmax is constant and thus monotonic. But without defining $<$ for elements in $R^n$ with $n>1$ I don't think monotonic makes any sense. – Martin Thoma – 2018-02-21T19:50:18.063

2@MartinThoma Thanks, actually it was also a question of mine. I didn't know, and still don't know, if there is an extension for monotonic in functions with multiple outputs. Math stuff, you know! – Media – 2018-02-22T14:06:51.987

4

FYI: Comprehensive list of activation functions in neural networks with pros/cons

– Franck Dernoncourt – 2015-12-07T01:13:53.543