331 How to choose the number of hidden layers and nodes in a feedforward neural network? 2010-07-20T00:15:02.920

159 What does the hidden layer in a neural network compute? 2013-07-02T15:59:07.463

102 What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders? 2014-09-04T20:52:24.883

97 Tradeoff batch size vs. number of iterations to train a neural network 2015-08-05T21:19:54.967

86 What is batch size in neural network? 2015-05-22T09:15:21.810

81 What are the advantages of ReLU over sigmoid function in deep neural networks? 2014-12-02T02:13:49.903

74 A list of cost functions used in neural networks, alongside applications 2015-05-31T19:37:16.517

70 Comprehensive list of activation functions in neural networks with pros/cons 2014-09-12T13:28:51.710

65 What is an embedding layer in a neural network? 2015-11-20T16:43:12.653

64 What is the difference between a neural network and a deep neural network, and why do the deep ones work better? 2015-11-20T11:25:30.093

63 Is it possible to train a neural network without backpropagation? 2016-09-20T01:48:21.347

60 Proper way of using recurrent neural network for time series analysis 2011-03-08T07:16:01.813

60 How to apply Neural Network to time series forecasting? 2011-04-30T00:11:19.003

58 What does 1x1 convolution mean in a neural network? 2016-02-05T03:33:17.477

56 R libraries for deep learning 2012-11-02T17:35:56.267

56 What is the difference between a neural network and a deep belief network? 2013-03-04T04:18:42.890

55 Difference between neural net weight decay and learning rate 2012-05-25T05:17:27.130

54 Why do neural network researchers care about epochs? 2016-10-24T02:44:59.437

50 What are good initial weights in a neural network? 2013-01-12T21:26:39.230

45 tanh activation function vs sigmoid activation function 2014-06-08T06:11:24.523

43 What's the difference between feed-forward and recurrent neural networks? 2010-08-30T15:33:28.180

43 Multivariate linear regression vs neural network? 2012-10-27T08:06:23.977

43 How and why do normalization and feature scaling work? 2012-11-01T20:20:48.747

40 Why is logistic regression a linear classifier? 2014-04-12T19:34:29.373

40 How large should the batch size be for stochastic gradient descent? 2015-03-07T21:18:36.213

40 Why are neural networks becoming deeper, but not wider? 2016-07-09T06:35:12.870

40 Neural network references (textbooks, online courses) for beginners 2016-08-02T16:35:34.477

39 Neural networks vs support vector machines: are the second definitely superior? 2012-06-08T02:59:39.850

39 Softmax layer in a neural network 2013-12-12T12:57:00.100

38 Understanding convolutional neural networks 2014-02-07T15:01:05.123

37 What are alternatives of Gradient Descent? 2014-05-09T07:21:38.047

37 Why do Convolutional Neural Networks not use a Support Vector Machine to classify? 2015-08-20T14:43:48.633

36 How to visualize/understand what a neural network is doing? 2011-06-09T17:19:19.360

35 Neural Networks: weight change momentum and weight decay 2013-09-16T01:56:28.897

34 What are the differences between hidden Markov models and neural networks? 2011-12-31T21:03:35.660

34 Recurrent vs Recursive Neural Networks: Which is better for NLP? 2015-05-22T17:50:20.360

32 Pre-training in deep convolutional neural network? 2015-07-28T18:30:02.047

31 How can an artificial neural network ANN, be used for unsupervised clustering? 2015-03-03T16:21:01.627

31 Understanding "almost all local minimum have very similar function value to the global optimum" 2016-03-23T17:02:05.307

30 What's the relation between hierarchical models, neural networks, graphical models, bayesian networks? 2010-11-13T05:43:15.703

30 How to get started with neural networks 2012-09-13T16:54:21.840

30 What are the differences between sparse coding and autoencoder? 2014-10-07T17:44:54.960

30 Difference between GradientDescentOptimizer and AdamOptimizer (TensorFlow)? 2015-12-01T13:48:18.240

29 How to train and validate a neural network model in R? 2012-01-25T20:21:27.683

29 Can SVM do stream learning one example at a time? 2012-04-07T19:29:07.010

29 What're the differences between PCA and autoencoder? 2014-10-15T07:26:55.363

28 Tensorflow: Adam Optimizer with Exponential Decay 2016-03-05T08:22:01.280

28 What loss function for multi-class, multi-label classification tasks in neural networks? 2016-04-17T14:28:37.823

27 Data normalization and standardization in neural networks 2011-03-01T18:53:04.537

27 Why sigmoid function instead of anything else? 2015-07-24T11:14:30.010

27 Why are there no deep reinforcement learning engines for chess, similar to AlphaGo? 2017-10-19T07:38:11.600

27 What did my neural network just learn? What features does it care about and why? 2018-01-11T17:00:49.847

26 Cost function of neural network is non-convex? 2014-07-09T13:59:38.697

26 Convolutional neural networks: Aren't the central neurons over-represented in the output? 2014-10-13T08:39:35.323

26 How does the Adam method of stochastic gradient descent work? 2016-06-24T15:45:07.683

25 Difference between logistic regression and neural networks 2012-11-14T02:29:35.170

25 Difference between Bayes network, neural network, decision tree and Petri nets 2014-04-21T04:16:02.420

25 Why not just dump the neural networks and deep learning? 2017-08-11T02:30:44.357

24 Backpropagation vs Genetic Algorithm for Neural Network training 2013-04-11T23:42:41.250

24 How does rectilinear activation function solve the vanishing gradient problem in neural networks? 2015-10-13T20:05:12.780

24 How does LSTM prevent the vanishing gradient problem? 2015-12-08T09:01:47.920

24 Which activation function for output layer? 2016-06-12T14:42:11.510

23 What is maxout in neural network? 2014-12-19T04:46:25.523

22 Restricted Boltzmann machines vs multilayer neural networks 2012-10-17T17:09:14.977

22 What is the architecture of a stacked convolutional autoencoder? 2015-02-13T08:28:39.883

22 Why is it so important to have principled and mathematical theories for Machine Learning? 2017-12-12T17:50:29.303

21 What can we learn about the human brain from artificial neural networks? 2015-06-28T22:41:19.243

21 Restricted Boltzmann Machine : how is it used in machine learning? 2016-06-28T00:41:16.070

21 What is global max pooling layer and what is its advantage over maxpooling layer? 2017-01-20T16:55:13.170

21 From Bayesian Networks to Neural Networks: how multivariate regression can be transposed to a multi-output network 2017-02-07T09:33:22.357

20 Modern neural networks that build their own topology 2012-02-12T01:42:55.523

20 How does Krizhevsky's '12 CNN get 253,440 neurons in the first layer? 2015-01-10T06:19:49.090

20 Why are bias nodes used in neural networks? 2015-12-09T14:51:53.010

19 How does neural network recognise images? 2012-10-09T16:50:31.440

19 Autoencoders can't learn meaningful features 2014-12-31T07:08:15.910

19 Understanding LSTM units vs. cells 2016-10-23T23:37:10.387

18 Why doesn't backpropagation work when you initialize the weights the same value? 2012-12-04T12:25:02.853

18 Convolutional neural network for time series? 2014-12-10T18:52:44.653

18 Deep neural nets, RELU's removing non-linearity? 2015-03-16T15:51:01.693

18 Rules for selecting convolutional neural network hyperparameters 2015-04-24T14:35:30.027

18 Using RNN (LSTM) for predicting the timeseries vectors (Theano) 2015-06-29T10:38:32.260

17 From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression? 2015-02-18T17:34:05.330

17 What are the benefits of using ReLU over softplus as activation functions? 2015-04-13T04:21:42.997

17 What is the reason that the Adam Optimizer is considered robust to the value of its hyper parameters? 2016-08-31T18:27:01.097

17 Iconic (toy) models of neural networks 2017-05-15T15:47:53.340

17 Quiz: Tell the classifier by its decision boundary 2017-08-05T16:59:07.987

17 How do I make my neural network better at predicting sine waves? 2017-10-10T14:41:40.573

16 Backpropagation algorithm 2010-12-10T19:21:44.130

16 Reason for not shrinking the bias (intercept) term in regression 2014-02-18T09:50:45.643

16 How can recurrent neural networks be used for sequence classification? 2014-12-17T03:15:42.127

16 Importance of the bias node in neural networks 2015-05-25T18:05:21.527

16 What does the term saturating nonlinearities mean? 2015-09-26T19:45:15.310

16 State of the art in general learning from data in '69 2016-02-01T15:31:09.600

16 Where should I place dropout layers in a neural network? 2016-10-14T20:23:54.230

16 Why is the cost function of neural networks non-convex? 2017-05-23T15:43:53.203

15 Using neural network for trading in stock exchange 2012-12-01T10:28:25.573

15 tanh vs. sigmoid in neural net 2015-03-18T18:35:19.033

15 ImageNet: what is top-1 and top-5 error rate? 2015-06-11T11:26:30.660

15 What optimization methods work best for LSTMs? 2015-08-24T09:31:09.807