Tag: gradient-descent

51 What is the difference between Gradient Descent and Stochastic Gradient Descent? 2018-08-04T06:36:04.657

34 Does gradient descent always converge to an optimum? 2017-11-09T16:41:20.940

27 Scikit-learn: Getting SGDClassifier to predict as well as a Logistic Regression 2015-08-04T08:11:30.990

17 Why ReLU is better than the other activation functions 2017-10-03T14:17:09.163

13 Is Gradient Descent central to every optimizer? 2019-03-12T10:04:15.807

11 Why is learning rate causing my neural network's weights to skyrocket? 2016-12-27T22:50:17.103

10 Stochastic gradient descent based on vector operations? 2014-10-10T13:34:11.543

10 How flexible is the link between objective function and output layer activation function? 2015-07-08T20:04:16.703

10 Why does it speed up gradient descent if the function is smooth? 2017-08-07T14:58:57.693

9 Understanding dropout and gradient descent 2015-08-27T19:36:53.297

9 Why averaging the gradient works in Gradient Descent? 2018-06-22T05:49:03.270

8 Can overfitting occur in Advanced Optimization algorithms? 2016-09-13T14:10:49.220

8 Understanding the mathematics of AdaGrad and AdaDelta 2018-02-10T13:21:43.947

8 How to plot cost versus number of iterations in scikit learn? 2018-02-28T16:00:18.873

8 Implementation of Stochastic Gradient Descent in Python 2018-04-24T23:57:03.723

8 How does Gradient Descent and Backpropagation work together? 2019-01-28T13:34:41.617

8 Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors? 2019-07-25T21:13:31.390

7 Trying to understand Logistic Regression Implementation 2015-10-30T04:21:27.737

7 Why isn't leaky ReLU always preferable to ReLU given the zero gradient for x<0? 2017-01-31T09:28:40.410

7 Why do we use gradients instead of residuals in Gradient Boosting? 2018-05-13T20:25:59.203

7 What is the difference between SGD classifier and the Logisitc regression? 2018-09-07T18:15:11.690

7 clipping the reward for adam optimizer in keras 2018-10-03T20:07:59.787

7 Duplicated features for gradient descent 2020-01-26T16:36:27.600

6 How does Tensorflow compute gradients of reduce_min operation? 2017-08-01T09:22:53.650

6 Final layer of neural network responsible for overfitting 2017-10-22T19:08:34.470

6 Changing the batch size during training 2021-01-29T08:23:20.517

5 Stochastic gradient descent in logistic regression 2014-07-07T11:43:48.430

5 Implementing RMSProp, but finding differences between reference versions 2015-09-04T21:24:25.313

5 What is conjugate gradient descent? 2015-09-30T13:04:37.247

5 How to update weights in a neural network using gradient descent with mini-batches? 2015-12-14T17:21:21.047

5 differences between LSQR and FTRL when working with very sparse data 2016-02-07T12:42:49.030

5 Benefits of stochastic gradient descent besides speed/overhead and their optimization 2017-01-30T04:52:22.863

5 Why is vanishing gradient a problem? 2017-05-31T00:51:38.923

5 Why does gradient descent gives me much better Relative Squared Error then the Least Squares approach? 2017-06-16T15:05:49.527

5 How does LightGBM deal with value scale? 2017-08-07T14:46:02.643

5 What feature engineering is necessary with tree based algorithms? 2017-08-08T15:00:47.583

5 How backpropagation through gradient descent represents the error after each forward pass 2017-12-09T13:52:25.563

5 Gradient Checking LSTM - how to get change in Cost across timesteps? 2018-04-27T04:42:14.163

5 How to get out of local minimums on stochastic gradient descent? 2019-01-21T11:34:40.440

5 Gradient Descent in ReLU Neural Network 2019-04-21T06:31:19.767

5 What is momentum in neural network? 2020-10-18T09:25:19.913

5 Getting NN weights for every batch / epoch from Keras model 2020-11-14T07:04:53.273

4 Terminology: SOMs, batch learning, online learning, and stochastic gradient descent 2015-03-18T17:25:32.460

4 When being in a perfect "Long Valley" situation, does momentum help? 2016-02-17T18:59:27.833

4 Training Restricted Boltzmann Machines (RBMs) using gradient descent 2016-05-20T16:54:42.143

4 When is Gradient Descent invoked on the objective function while running XGboost? 2016-09-15T05:53:00.727

4 Why do CNNs with ReLU learn that well? 2016-11-12T20:44:18.730

4 Xgboost quantile regression via custom objective 2016-12-22T17:06:56.187

4 Why is stochastic gradient descent so much worse than batch GD for MNIST task? 2017-02-10T14:28:13.587

4 How to search for an optimal dithering pattern? 2017-12-17T21:51:21.947

4 How to understand incremental stochastic gradient algorithm and its implementation in logistic regression [updated]? 2018-01-25T10:07:14.350

4 Plots with shaded standard deviation 2018-04-26T23:18:57.213

4 Adam optimizer for projected gradient descent 2018-05-15T23:02:28.800

4 Using Mean Squared Error in Gradient Descent 2018-06-14T20:44:07.030

4 Why Gradient methods work in finding the parameters in Neural Networks? 2018-10-05T16:13:32.203

4 too few data while too many degrees of freedom in linear regression 2018-12-08T14:02:24.073

4 Does Gradient Boosting detect non-linear relationships? 2019-02-11T10:49:46.917

4 Why does feature scaling improve the convergence speed for gradient descent? 2019-07-14T19:33:16.973

4 Question of using gradient descent instead of calculus. I checked previous questions there are still points to clarify 2019-08-13T04:42:36.487

4 How are weight updates handled in Batch Gradient Descent vs SGD? 2019-09-25T23:45:00.163

4 How to automatically test for the best parameters for transformed independent variable in linear model 2020-01-07T11:22:01.040

4 Which models can handle null values? 2020-01-28T19:41:55.680

4 How to solve the gradient descent on a linear classification problem? 2020-03-04T23:55:09.470

4 What is the best way to find minima in Logistic regression? 2020-03-15T17:23:01.060

4 How to prevent vanishing gradient or exploding gradient? 2020-04-15T05:00:24.967

4 Learning parameters when loss is a piecewise function 2020-07-10T04:05:44.843

3 Gradient Descent Step for word2vec negative sampling 2015-04-26T01:38:52.180

3 Feature Scaling and Mean Normalization 2015-11-20T20:52:08.420

3 DIfferent learning rates converging to same minima 2016-07-11T18:21:58.313

3 My ADALINE model using Gradient Descent is increasing error on each iteration 2017-01-15T08:02:40.347

3 Friedman H'statistic for interaction 2017-07-05T14:07:20.350

3 Catastrophic forgetting in linear semi-gradient RL agent? 2017-08-16T17:05:53.190

3 What is the stochastic part in stochastic gradient descent? 2017-09-28T10:51:29.450

3 How to implement gradient descent for a tanh() activation function for a single layer perceptron? 2017-10-17T10:43:28.863

3 Mean and Variance of Feature Scaling 2018-02-03T17:13:29.017

3 What Is Saturating Gradient Problem 2018-02-10T08:15:08.857

3 Gradient derivation reference for Phased LSTM 2018-02-15T20:20:38.277

3 Is the gradient descent the same if cost function has interaction? 2018-02-16T18:34:33.383

3 Grad Checking, verify by average? 2018-04-25T00:02:00.713

3 GradientChecking, can I blame float precision? 2018-04-26T05:32:31.020

3 Why Root Finding is important in Logistic Regression? (i.e. Newton Raphson) 2018-05-02T09:09:06.717

3 Tensorflow Calculate error for a single neuron 2018-07-04T18:29:50.000

3 Is gradient descent slower for finite differences? 2018-10-13T21:19:57.040

3 Is empirical risk the same thing as loss function? 2019-01-16T14:41:34.693

3 Is Adam's optimization susceptible to Local Minima? 2019-04-07T16:09:16.733

3 How is Stochastic Gradient Descent used like Mini Batch gradient descent? 2019-06-01T14:17:40.917

3 Are mini batches sampled randomly in Keras' Sequential.fit method() 2019-06-17T07:52:52.460

3 Fast Python implementation of the gradient descent 2019-08-14T19:54:00.097

3 Grid search or gradient descent? 2019-10-28T17:20:36.077

3 What do positive and negative gradient values mean for Convolutional Neural Network? 2020-01-13T19:21:21.407

3 Interpreting Gradients and Partial Derivatives when training Neural Networks 2020-01-17T17:33:51.643

3 Plotting Gradient Descent in 3d - Contour Plots 2020-02-14T01:37:49.593

3 Gradient Checking: MeanSquareError. Why huge epsilon improves discrepancy? 2020-04-18T23:30:23.527

3 Difference between RMSProp and Momentum? 2020-06-21T15:53:11.807

3 Why Gaussian mixture model uses Expectation maximization instead of Gradient descent? 2020-07-03T19:12:51.693

3 Will stochastic gradient descent converge for multivariate linear regression 2020-07-04T01:17:17.653

3 When does it make sense to choose gradient descent for SVM over liblinear? 2020-08-24T19:53:17.037

3 Using a random forest, would a RandomForest performance be less if I drop the first or the last tree? 2020-10-05T13:40:44.757

3 Why the sigmoid activation function results in sub-optimal gradient descent? 2020-10-30T17:16:53.030