Advantages of monotonic activation functions over non-monotonic functions in neural networks?


What are the advantages of using monotonic activation functions over non-monotonic functions in neural networks?

  • Do they perform better than non-monotonic ones?
  • Is this mathematically proven?
  • Are there any papers/references that are related to this?


Posted 2017-12-06T11:43:02.803

Reputation: 263



I don't know of any papers about this topic, but intuitively it makes a lot of sense to use monotonic activation functions. Let's say we have a non-monotonic activation function, maybe a Gaussian kernel, symmetric around $x=0$ but slides off towards $f(x)=0$ if x strays away from 0 on either side. If we have a sample that we feed into our network that performs poorly when our activation is high, we want to change the input of our node to give a lower activation. In case of a non-monotonic activation, whether we want to decrease or increase the input depends on whether the input was positive or negative, and is mostly dependent on our weight initialization.

This makes learning more difficult, because if another sample also needs it to be lower but was on the other side of the top, backpropogation will attempt to map the input to the other side. Most of the time the best solution will be to put everything on one side of the top, making it monotonic again. Another way of looking at it is that monotonic are somewhat one-to-one (not entirely true, for example ReLU). This means that two very different inputs don't map to the same output unless everything in between also maps there.

Here was a similar question with some links: (Why) do activation functions have to be monotonic?

Jan van der Vegt

Posted 2017-12-06T11:43:02.803

Reputation: 8 538


In addition to computational reasons, you can read about biological neural networks to know this:

In neuroscience, a biological neural network is a series of interconnected neurons whose activation defines a recognizable linear pathway.The interface through which neurons interact with their neighbors usually consists of several axon terminals connected via synapses to dendrites on other neurons. If the sum of the input signals into one neuron surpasses a certain threshold, the neuron sends an action potential (AP) at the axon hillock and transmits this electrical signal along the axon.

As activation of a neuron depends on the sum of inputs, biologically makes sense that activation function be an increasing function.


Posted 2017-12-06T11:43:02.803

Reputation: 1 001