15 Sliding window leads to overfitting in LSTM? 2018-02-09T01:10:44.847

9 Why averaging the gradient works in Gradient Descent? 2018-06-22T05:49:03.270

8 Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors? 2019-07-25T21:13:31.390

7 sklearn: SGDClassifier yields lower accuracy than LogisticRegression 2017-11-30T06:05:09.607

6 Changing the batch size during training 2021-01-29T08:23:20.517

5 How backpropagation through gradient descent represents the error after each forward pass 2017-12-09T13:52:25.563

5 Train loss vs validation loss 2018-04-26T20:57:46.517

5 Latent loss in variational autoencoder drowns generative loss 2018-07-19T15:49:56.233

3 training model on random samples from a large dataset 2018-09-18T09:28:04.973

3 Plotting Gradient Descent in 3d - Contour Plots 2020-02-14T01:37:49.593

3 Does small batch size improve the model? 2020-04-24T17:12:57.370

3 Will stochastic gradient descent converge for multivariate linear regression 2020-07-04T01:17:17.653

2 how does minibatch for LSTM look like? 2017-12-27T22:21:32.287

2 Vowpal Wabbit Online Normalization -- Possible to parallelize? 2018-02-15T09:01:47.317

2 How much of a problem is each member of a batch having the same label? 2020-07-01T12:07:21.923

2 In sequence models, is it possible to have training batches with different timesteps each to reduce the required padding per input sequence? 2020-11-26T08:57:44.347

1 Is training one epoch using mini-batch gradient descent slower than using batch gradient descent? 2017-11-10T14:42:34.260

1 Batch Normalization will disrupt multi-threading? 2017-12-29T14:04:30.003

1 Online vs minibatch training for speed 2018-02-20T15:03:54.753

1 Point of dropping weights in mini batch for purpose of regularization 2018-04-12T14:00:40.450

1 Test data also being processed in batches 2018-06-21T11:58:40.820

1 Setting batch size: training requires twice as much memory as validating 2018-08-11T11:51:28.997

1 splitting of training examples into the mini batch: what to do with the rest tiny mini-batch? 2018-08-20T09:08:46.407

1 Powers of 2 for batch_size in model fit in deep learning 2018-12-23T07:24:18.607

1 Mini-batches with sequential data 2019-01-04T07:15:05.133

1 What is the difference between different batch_sizes in Keras Sequential models? 2019-02-23T14:54:42.237

1 SGD vs SGD in mini batches 2019-03-06T16:48:17.077

1 Which batch size to use when Batch Normalization? 2019-12-15T00:02:21.613

1 Does Minibatch reduce drawback of SGD? 2020-01-09T02:36:11.717

1 Displaying network error as a single value 2020-04-24T18:47:39.643

1 mini batch vs. batch gradient descent 2020-05-06T08:03:26.473

1 How to implement large-scale Poisson Regression in Python 2020-10-15T19:30:32.130

0 Can we use decreasing step size to replace mini-batch in SGD? 2019-02-28T03:39:19.520

0 How to find learning rate decay? 2019-03-19T05:43:31.837

0 Does it make sense to train an Autoencoder for Dimensionality Reduction using Mini-Batch Gradient Descent? 2019-07-01T14:06:20.473

0 PyTorch MultiLayer Perceptron Classification Size of Features vs Labels Wrong 2020-03-18T00:45:25.173

0 Tensorflow - Manually decay Adam optimizer 2020-03-26T20:50:48.147

0 Mini Batch Gradient Descent shuffling 2020-04-27T17:38:47.510

0 How does stateful LSTM work with keras' batch_size > 1? 2020-04-29T17:29:49.517

0 Averaging biased gradient information? 2020-05-09T09:40:21.850

0 Why is my LSTM is working best with batch size of 2 and no hidden layers? 2020-05-29T09:58:39.197

0 What are ways to pick a single training sample to compute gradient in SGD? 2020-07-23T05:22:49.543

0 DNN predicting the same value for train+test Data 2020-09-07T05:08:36.213

0 In Mini Batch Gradient Descent what happens to remaining examples 2020-09-12T05:41:25.957

0 With Stochastic Gradient Descent why we dont compute exact derivative of loss function? 2020-09-13T06:55:48.810

0 Why Mini batch gradient descent is faster than gradient descent? 2020-09-14T05:23:29.517

0 Why divide by batch size when back-propagate from softmax + log loss 2020-12-13T02:54:57.063

0 Neural Network Optimization steps order 2021-01-16T03:50:30.013

0 Why are mini-batches degrading my conv net MNIST classifier? 2021-02-04T19:25:57.523

0 larger batches decrease learning rate because of a technical artifact? 2021-02-20T19:56:59.817

0 When to use Gradient boosting over stochastic gradient boosting 2021-02-24T02:12:12.620