68 Choosing a learning rate 2014-06-16T18:08:38.623

59 What is the "dying ReLU" problem in neural networks? 2015-05-07T04:11:56.600

38 How to fight underfitting in a deep neural net 2014-07-13T09:04:39.703

35 When to use GRU over LSTM? 2016-10-17T11:47:45.340

25 Time series prediction using ARIMA vs LSTM 2016-07-11T16:45:21.020

24 When to use (He or Glorot) normal initialization over uniform init? And what are its effects with Batch Normalization? 2016-07-28T17:12:29.933

24 PyTorch vs. Tensorflow Fold 2017-02-08T10:26:16.887

21 Why are NLP and Machine Learning communities interested in deep learning? 2014-10-11T10:24:01.393

21 How do you visualize neural network architectures? 2016-07-18T17:08:17.237

20 Deep Learning vs gradient boosting: When to use what? 2014-11-20T06:49:00.357

20 Deep learning basics 2014-12-08T22:37:32.777

20 Number of parameters in an LSTM model 2016-03-09T11:14:20.163

20 Adding Features To Time Series Model LSTM 2017-02-21T22:17:40.000

19 Intuitive explanation of Noise Contrastive Estimation (NCE) loss? 2016-08-05T03:36:04.553

16 Paper: What's the difference between Layer Normalization, Recurrent Batch Normalization (2016), and Batch Normalized RNN (2015)? 2016-07-23T09:46:42.783

16 Choosing between CPU and GPU for training a neural network 2017-05-25T23:48:26.343

15 Choosing between TensorFlow or Theano as backend for Keras 2015-12-07T16:42:04.107

14 Hyperparameter search for LSTM-RNN using Keras (Python) 2016-01-17T18:26:54.320

14 How to draw Deep learning network architecture diagrams? 2016-11-03T03:10:24.893

13 Bagging vs Dropout in Deep Neural Networks 2015-11-16T14:41:08.553

13 Keyword/phrase extraction from Text using Deep Learning libraries 2016-02-03T10:56:51.447

11 How are deep-learning NNs different now (2016) from the ones I studied just 4 years ago (2012)? 2016-10-04T13:13:15.930

11 How does Keras calculate accuracy? 2016-10-07T08:10:51.287

11 Why mini batch size is better than one single "batch" with all training data? 2017-02-07T12:40:25.200

11 Why should the data be shuffled for machine learning tasks 2017-11-09T07:42:15.517

10 Transforming AutoEncoders 2015-10-30T15:59:38.850

10 How word2vec can be used to identify unseen words and relate them to already trained data 2015-12-26T03:47:48.800

10 When do we say that the dataset is not classifiable? 2017-12-05T12:09:52.173

9 Visualizing deep neural network training 2014-12-10T10:15:00.940

9 Difference between "equivariant to translation" and "invariant to translation" 2017-01-04T08:41:15.700

9 Cross-entropy loss explanation 2017-07-10T10:26:39.450

9 local minima vs saddle points in deep learning 2017-09-05T19:14:30.057

9 What is the "novel reinforcement learning algorithm" in AlphaGo Zero? 2017-10-19T23:38:40.733

9 PyTorch vs. Tensorflow eager 2017-11-07T17:12:14.060

8 Theano in deep learning research 2015-05-30T08:33:06.713

8 How do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer? 2015-06-02T20:16:43.627

8 Deep Learning with Spectrograms for sound recognition 2016-01-29T15:39:26.277

8 Relu does have 0 gradient by definition, then why gradient vanish is not a problem for x < 0? 2016-05-04T18:17:38.530

8 Reshaping of data for deep learning using Keras 2016-05-12T13:41:11.543

8 Using Neural Networks to extract multiple parameters from images 2016-06-13T07:42:57.850

8 How to calculate the mini-batch memory impact when training deep learning models? 2016-07-07T13:51:42.563

8 Why TensorFlow can't fit simple linear model if I am minimizing absolute mean error instead of the mean squared error? 2016-11-17T13:10:22.037

8 How to add a new category to a deep learning model? 2016-12-10T01:43:09.343

8 Why do convolutional neural networks work? 2016-12-23T12:43:47.203

8 What is Ground Truth 2017-03-24T12:09:14.510

8 Why does it speed up gradient descent if the function is smooth? 2017-08-07T14:58:57.693

8 Why should the initialization of weights and bias be chosen around 0? 2017-08-09T07:30:39.773

8 What is the difference between Dilated Convolution and Deconvolution? 2017-08-18T14:09:42.870

8 Convolutional neural network overfitting. Dropout not helping 2017-08-22T23:52:26.863

8 Multi GPU in keras 2017-10-18T20:30:52.027

7 Any idea about application of deep dream? 2015-08-12T16:17:14.967

7 Why is Reconstruction in Autoencoders Using the Same Activation Function as Forward Activation, and not the Inverse? 2016-01-12T23:39:55.800

7 HOW TO: Deep Neural Network weight initialization 2016-03-28T12:17:51.047

7 Does batch_size in Keras have any effects in results' quality? 2016-07-01T11:54:14.957

7 Do convolutions "flatten images"? 2017-01-30T15:26:23.530

7 number of parameters for convolution layers 2017-02-20T00:23:15.170

7 What is an 1D Convolutional Layer in Deep Learning? 2017-02-28T08:12:08.210

7 Does it make sense to train a CNN as an autoencoder? 2017-03-21T12:53:56.477

7 Convolutional network for classification, extremely sensitive to lighting 2017-09-03T15:04:51.820

7 Are there free cloud services to train machine learning models? 2017-11-03T12:41:54.203

7 Why is ReLU used as an activation function? 2018-01-10T13:07:47.997

6 Convolutional neural network for sparse one-hot representation 2015-05-18T11:50:35.383

6 Understanding dropout and gradient descent 2015-08-27T19:36:53.297

6 Image Captioning in Keras 2016-02-24T04:17:12.343

6 How does deep learning helps in detecting multiple objects in single image? 2016-04-07T19:15:04.710

6 Is there any domain where Spiking Neural Networks outperform other algorithms (non-spiking)? 2016-04-29T15:43:44.527

6 Deep neural net modelling strategy 2016-06-09T02:38:44.360

6 How to approach the numer.ai competition with anonymous scaled numerical predictors? 2016-06-29T16:11:34.107

6 Is there a known convolutional net architecture to calculate object masks for images? 2016-07-09T22:05:47.747

6 Validation loss and accuracy remains constant 2016-08-23T06:19:59.023

6 Machine Learning vs Deep Learning 2017-01-20T10:45:27.660

6 Question about the simple example for batch normalization given in "deep learning" book 2017-02-18T14:30:31.000

6 What is the classical way to visualize 3D filters in convolutional neural networks? 2017-02-28T13:34:54.610

6 What is missing from the following Curriculum Learning implementation in a Deep Neural Net? 2017-02-28T16:09:56.263

6 deep learning for non-image non-NLP tasks? 2017-03-08T11:01:26.670

6 Which Amazon EC2 instance for Deep Learning tasks? 2017-03-13T10:02:26.910

6 Are there any rules for choosing the size of a mini-batch? 2017-04-17T16:18:22.793

6 What is the best hardware/GPU for deep learning? 2017-04-27T04:49:28.193

6 understanding batch normalization 2017-10-30T16:50:00.693

6 Does gradient descent always converge to an optimum? 2017-11-09T16:41:20.940

6 How does neural network solve XOR problem 2017-12-01T03:25:30.507

6 How to set the number of neurons and layers in neural networks 2018-01-13T15:26:31.233

6 Image segmentation - handcrafted features vs DNN? 2018-02-24T03:22:37.157

5 How to do multitask learning using Caffe? 2015-07-12T19:08:38.467

5 Do I need to buy a NVIDIA graphic card to run deep learning algorithm? 2015-07-31T08:21:49.083

5 How is it possible to process an image with a few neurons? 2015-12-27T16:52:01.120

5 What are the advantages of contrastive divergence vs the gradient of the quadratic difference between the original data and the reconstructed data? 2016-01-22T17:56:30.493

5 Training Deep Nets on an Ordinary Laptop 2016-02-20T07:24:56.140

5 What are the most concrete and easiest to understand applications of deep learning in the industry? 2016-03-05T11:18:28.513

5 Why is my artificial neural networks almost always predicting positive elements? 2016-03-24T17:32:12.767

5 Training neural nets: is it important that the data is randomly sorted? 2016-04-24T17:12:38.880

5 What GPU specifications matter when training and using neural networks? 2016-05-08T04:15:59.083

5 How to predict on part of image after training on other part of image? 2016-05-17T16:49:37.667

5 Google TPU: when/how will it be available to me? 2016-06-03T10:54:33.320

5 How are per-layer-detected-patterns in a trained CNN plotted? 2016-12-10T20:51:52.123

5 RELU vs Pooling 2017-01-12T07:53:26.733

5 Multi scale CNN Network Python 2017-01-14T16:07:25.703

5 reason for square images for deep learning 2017-01-29T10:02:56.527

5 How does LSTM fights vanishing gradient? 2017-03-07T19:28:43.240