You need to discriminate between two types of neural networks. If your output variable is continous you can use linear, ReLU, tanh, logistic-sigmoid,... as activation functions, because these functions map continous inputs to continous outputs. If your output is discrete / categorical you can use the signum (binary) or softmax activation (multiclass) function as activation function for the output layer.

The cost function is often a function that is comparing the real outputs $y_n$ and the predicted outputs $\hat{y}(x_n)$ for the input $x_n$ for all $n=1,...,N$. Let us introduce the comparison function $D(y_n,\hat{y}(x_n))$. The comparison function has a low value if the predicted output is almost equal to the real output and high if the outputs are not similar. Assuming all the observations are equivalently important, we could sum the values of comparison function applied on all observations and obtain the integrated loss

$$J=\sum_{n=1}^ND(y_n,\hat{y}(x_n))$$

for the whole data set.

In order to see the influence of the activation function $g$ in the last layer we summarize the transfer function from the input to the last layer as $f(x_n)$. Then the predicted output $\hat{y}(x_n)$ can be written as

$$\hat{y}(x_n)=g(f(x_n)).$$

Hence, the activation function at the output has an effect on the integrated loss $J$. For example if you choose the $\tanh$ as output activation you will bound your outputs in the intervall $(-1,1)$ which will be a bad choice if your outputs can be from $\mathbb{R}$ and your cost function will probably have a very high value while training. A better choice would be a linear activation function at the output layer.

I think it's a broad question. Definitely the formula would change and the corresponding result would change as well and its extend depends on the specific problem we are dealing with. it might be better to take a look at each of these functions specific formula. – Fatemeh Asgarinejad – 2019-07-06T22:07:32.607