33 Why is ReLU used as an activation function? 2018-01-10T13:07:47.997

26 What is GELU activation? 2019-04-18T08:06:24.200

22 How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu? 2018-10-02T04:06:47.510

18 Difference of Activation Functions in Neural Networks in general 2016-10-04T11:05:24.647

17 Why ReLU is better than the other activation functions 2017-10-03T14:17:09.163

10 How to create custom Activation functions in Keras / TensorFlow? 2019-09-09T07:34:52.487

7 What is the purpose of multiple neurons in a hidden layer? 2016-09-16T00:51:07.420

7 Can the vanishing gradient problem be solved by multiplying the input of tanh with a coefficient? 2019-05-07T13:07:40.580

7 Why leaky relu is not so common in real practice? 2020-05-14T02:30:39.380

6 Input normalization for ReLu? 2017-12-20T03:39:45.393

6 How does Sigmoid activation work in multi-class classification problems 2018-10-06T08:41:48.900

5 Why do so many functions used in data science have derivatives of the form f(x)*(1-f(x))? 2017-09-30T04:35:22.077

5 Advantages of monotonic activation functions over non-monotonic functions in neural networks? 2017-12-06T11:43:02.803

5 Activation function vs Squashing function 2018-08-06T12:48:02.987

5 Gradient Descent in ReLU Neural Network 2019-04-21T06:31:19.767

5 Exponential Linear Units (ELU) vs $log(1+e^x)$ as the activation functions of deep learning 2019-06-09T13:35:10.117

4 What's the correct reasoning behind solving the vanishing/exploding gradient problem in deep neural networks.? 2019-02-09T23:36:42.540

4 Is it wrong to use Glorot Initialization with ReLu Activation? 2020-01-23T17:50:58.507

4 Why is activation needed at all in neural network? 2020-02-19T09:18:08.053

3 Why are sigmoid/tanh activation function still used for deep NN when we have ReLU? 2016-07-10T20:31:28.220

3 Why is an activation function notated as "g"? 2017-11-03T11:03:34.100

3 Is there a way to set a different activation function for each hidden unit in one layer in keras? 2017-12-23T17:08:59.973

3 Is classifier able to say there's no-such-case? 2018-01-12T07:05:55.410

3 How can ReLU ever fit the curve of x²? 2018-08-17T09:46:35.417

3 Product of dot products in neural network 2019-02-13T12:49:24.333

3 Can we use ReLU activation function as the output layer's non-linearity? 2019-03-15T11:54:01.990

3 What activation function should I use for a specific regression problem? 2019-03-21T18:14:16.453

3 Using LeakyRelu as activation function in CNN and best alpha for it 2019-06-21T22:35:56.847

3 Why activation functions used in neural networks generally have limited range? 2019-11-08T14:04:08.760

3 TensorFlow Sigmoid activation function as output layer - value interpretation 2020-01-07T01:09:18.747

3 How to quantitatively evaluate raw neural network activations? 2020-01-08T12:12:02.597

3 How does one derive the modified tanh activation proposed by LeCun? 2020-01-25T14:17:04.880

3 Leaky ReLU inside of a Simple Python Neural Net 2020-02-19T04:30:29.477

3 Vanishing Gradient vs Exploding Gradient as Activation function? 2020-02-26T13:03:00.863

3 What came first? Backpropagation or Sigmoid? 2020-02-27T22:03:28.877

3 Why the sigmoid activation function results in sub-optimal gradient descent? 2020-10-30T17:16:53.030

2 Negative Rewards and Activation Functions 2017-12-26T14:37:21.760

2 Cross error loss function cause division by zero error 2018-06-01T16:15:01.197

2 Alternatives to linear activation function in regression tasks to limit the output 2018-06-08T15:49:16.450

2 Few activation functions handling various problems - neural networks 2018-08-08T15:44:53.450

2 Restricting the output of a model didn't improve the loss value of the model evaluation 2018-10-01T13:08:46.590

2 "Each agent was evaluated every 250,000 training frames for 135,000 validation frames" What does this sentences stands for? in DQN nature paper? 2018-10-02T09:56:29.673

2 Implementing a custom hard sigmoid function 2018-12-24T11:09:52.960

2 Why activation function is not needed during the runtime of an Word2Vec model 2019-01-03T19:10:53.677

2 Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function? 2019-01-21T11:17:17.863

2 activation functions in multiple layers in CNNs 2019-02-18T14:58:12.917

2 Are there any activation functions which on inputting integer data will produce the output as integers? 2019-03-15T08:36:25.843

2 Why do we use a softmax activation function in Convolutional Autoencoders? 2019-06-15T05:37:57.420

2 ReLU for combating the problem of vanishing gradient in RNN? 2019-10-07T05:23:59.737

2 Different activation function in same layer of a Neural network 2020-04-19T00:41:32.950

2 What does the descision boundary of a relu look like? 2020-06-15T10:27:26.340

2 Output landscape of ReLU, Swish and Mish 2020-09-08T05:17:38.663

2 As RELU is not differentiable when it touches the x-axis, doesn't it effect training? 2020-09-16T05:30:40.640

1 Why isn't Maxout used in the state of the art models? 2017-11-16T21:05:39.813

1 How to Implement Biological Neuron Activations in Artificial Neural Networks 2018-02-22T23:16:25.023

1 Homemade deep learning library: numerical issue with relu activation 2018-03-19T21:29:04.307

1 Weights in neural network 2018-04-07T09:02:07.377

1 Is it possible to customize the activation function in scikit-learn's MLPRegressor? 2018-04-27T05:35:05.490

1 Understanding of threshold value in a neural network 2018-08-07T15:21:53.360

1 How to display the value of activation? 2018-08-13T09:58:38.987

1 Why is the softmax function often used as activation function of output layer in classification neural networks? 2018-08-23T15:10:11.697

1 Neural network example not working with sigmoid activation function 2018-08-29T03:55:16.930

1 best activation function for ensemble? 2018-10-02T03:17:03.620

1 The mix of leaky Relu at the first layers of CNN along with conventional Relu for object detection 2018-11-18T18:01:09.440

1 What does it mean for an activation function to be "saturated/non-saturated"? 2019-01-18T18:34:24.080

1 Regression with -1,1 target range - Should we use a tanh activation in the last 1 unit dense layer? 2019-01-25T07:49:00.100

1 What are best activation and regularization method for LSTM? 2019-04-11T07:57:59.973

1 How to adjust the Regression with ANN for last part of function 2019-05-24T08:40:04.027

1 Confusion regarding the Working mechanism of Activation function 2019-06-17T16:11:11.830

1 Derivative of activation function used in gradient descent algorithms 2019-07-13T13:19:06.150

1 Combining multiple neural networks with different activation functions 2019-10-10T19:21:02.047

1 Square-law based RBF kernel 2019-10-26T01:25:01.150

1 Relationship between Sigmoid and Gaussing Distribution 2019-11-22T08:39:49.813

1 Should I scale/normalize the data before training a feedforward neural network using only lagged values? 2019-12-12T14:59:21.210

1 Counting Number of Parameters in Neural Networks 2019-12-22T22:38:50.563

1 Binary classifier using Keras with backend Tensorflow with a Binary output 2019-12-23T22:27:13.497

1 Activation function vs If else statement 2020-01-07T10:38:25.890

1 How does one use activation function with greater than [-1;1] range for binary classification? 2020-01-25T14:31:56.943

1 Which activation function of the output layer and which loss function are advised to be used for bounded regression? 2020-04-08T12:33:30.387

1 Generalized softmax derivative for implementation with any loss function 2020-04-15T09:49:36.627

1 Should output data scaling correspond to the activation function's output? 2020-04-27T08:59:00.897

1 Relu with not gradient vanishing function is possible? 2020-04-28T09:02:00.857

1 What is the gradient descent rule using binary cross entropy (BCE) with tanh? 2020-04-30T08:00:45.113

1 Setting activation function to a leaky relu in a Sequential model 2020-05-12T23:34:04.633

1 Wich activation function for DQL 2020-06-17T12:26:04.577

1 How does Pytorch deal with non-differentiable activation functions during backprop? 2020-07-02T23:46:15.890

1 Using Iterative Hard/Soft Thresholding in autoencoder with non linear activation 2020-07-19T00:21:20.103

1 Sharing parameters of an activation across layers of a neural network 2020-08-09T12:30:17.163

1 What is the reason behind Keras choice of default (recurrent) activation functions in LSTM networks 2020-11-15T19:43:39.413

1 Is it possible to get an ROC curve using Relu activation? 2020-11-24T02:35:41.473

1 Dying leaky ReLU 2020-11-26T11:38:59.273

1 Problem with convergence of ReLu in MLP 2020-12-10T19:24:44.717

1 Scaling the activation function 2020-12-18T15:46:46.293

1 Can we talk about vanishing activations? 2021-02-05T18:37:01.647

1 Are non-relu activations better for small/ dense datasets? 2021-02-16T14:26:27.533

0 What exactly is the "hyperbolic" tanh function used in the context of activation functions? 2018-03-06T15:35:47.237

0 Properly using activation functions of neural network 2018-07-07T13:36:47.670

0 Quasi-linearity in deep learning regression problems (sports betting) 2019-04-27T08:45:15.390

0 How does cost function change by choice of activation function (ReLU, Sigmoid, Softmax)? 2019-07-06T21:33:09.713