3

1

I'm trying to implement some Image super-resolution models on medical images. After reading a set of papers, I found that none of the existing models use any activation layer for the last layer. What's the rationale behind that?

3

1

I'm trying to implement some Image super-resolution models on medical images. After reading a set of papers, I found that none of the existing models use any activation layer for the last layer. What's the rationale behind that?

4

I am not into the field of super resolution but I think this question applies to general neural network construction.

Usually you try to solve a classification problem or a regression problem with your neural network.

For classification you try to predict probabilities that a specific output corrensponds to a specific class. Therefore every output value should be a probability and therefore have a range between 0 and 1. To achieve this you usually use a softmax or sigmoid function as your last layer to squash the output between 0 and 1. In addition to this (which is wanted in classification tasks), these functions raise the probability output of likely classes while decreasing the probability of all other unlikely classes (therefore enforcing the network to choose for one specific class over the others).

For the regression task you are not looking for probability values as your output values but instead for real valued numbers. In such a case no activation function is wanted since you want to be able to approximate any possible real value and not probabilities.

So in the case of super resolution I think the generated output is a map where each value corresponds to a pixel value of the super resolution image. In that case your pixels are real number values and no probabilities. So you are solving a regression problem.

But you could also go with a classification approach where you have 256 output maps that give probabilitiy to each possible pixel value between 0..255

0

As discussed here:

Linear is the preferred activation function.

But then linear activation function is equivalent to no activation function at all:

0

In terms of a neural net, a linear activation is like a pass through or no activation function. Depending on how the algorithm has been programmed, it may (for ease and clarity) require the application of an activation function. If you want the last layer to simply be the dot product of that layerâ€™s weights and the preceding layer (e.g. without a tanh activation function) the linear activation serves this purpose.