101 Choosing a learning rate 2014-06-16T18:08:38.623

46 Should a model be re-trained if new observations are available? 2016-07-13T11:03:54.740

35 Why not always use the ADAM optimization technique? 2018-04-15T16:55:34.020

34 Does gradient descent always converge to an optimum? 2017-11-09T16:41:20.940

30 Are there any rules for choosing the size of a mini-batch? 2017-04-17T16:18:22.793

28 Guidelines for selecting an optimizer for training neural networks 2016-03-04T09:32:17.287

21 local minima vs saddle points in deep learning 2017-09-05T19:14:30.057

15 How many features to sample using Random Forests 2017-10-10T10:50:22.720

13 Why aren't Genetic Algorithms used for optimizing neural networks? 2018-09-16T08:34:49.787

13 Is Gradient Descent central to every optimizer? 2019-03-12T10:04:15.807

11 Fisher Scoring v/s Coordinate Descent for MLE in R 2014-07-03T17:11:01.770

11 Why is learning rate causing my neural network's weights to skyrocket? 2016-12-27T22:50:17.103

9 Difference between RMSProp with momentum and Adam Optimizers 2018-01-18T15:21:12.467

8 Can overfitting occur in Advanced Optimization algorithms? 2016-09-13T14:10:49.220

8 Why does decreasing the SGD learning rate cause a massive increase in accuracy? 2017-09-21T16:02:30.073

8 Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors? 2019-07-25T21:13:31.390

7 Simple example of genetic alg minimization 2016-01-05T10:57:35.867

7 Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training? 2016-07-12T17:16:03.147

7 Why do we use gradients instead of residuals in Gradient Boosting? 2018-05-13T20:25:59.203

7 Can a GAN-like architecture be used for maximizing the value of a regression predictor? 2018-05-24T18:42:20.837

7 clipping the reward for adam optimizer in keras 2018-10-03T20:07:59.787

6 Gerrymandering - Geospatial optimization to maximize votes in R 2015-05-21T15:25:35.377

6 Step-by-step construction of an RBF neural network 2017-08-02T17:25:38.250

6 Least Squares optimization 2019-02-10T06:33:37.720

6 Optimising for Brier objective function directly gives worse Brier score than optimising with custom objective - what does it tell me? 2020-04-06T07:27:07.103

5 Which Optimization method to use? 2014-12-20T03:37:21.820

5 Optimizing parameters for a closed (black-box) system 2015-09-08T17:51:50.850

5 The connection between optimization and generalization 2018-03-23T12:38:15.440

5 Loss function for optimising precision & recall / sensitivity & specificity? 2018-06-12T16:32:25.413

5 Genetic algorithms(GAs): to be considered only as optimization algorithms? Are GAs used in machine learning any way? 2019-10-28T09:30:28.297

5 Can I completely cancel the effects of using a smaller batch size by reducing the learning rate? 2020-01-16T15:42:35.227

4 Open source solver for large mixed integer programming task? 2014-05-21T19:41:19.857

4 How to apply AdaBoost to more "complex" (non-binary) classifications/data fitting? 2014-12-26T06:53:29.670

4 When being in a perfect "Long Valley" situation, does momentum help? 2016-02-17T18:59:27.833

4 Parallel active optimization 2016-02-27T15:59:49.043

4 Reducing sample size 2016-12-27T19:28:17.920

4 how to make decision based on users reports 2017-02-10T09:56:29.783

4 Is reseating passengers a reinforcement learning problem? 2017-12-19T07:42:15.533

4 Adam optimizer for projected gradient descent 2018-05-15T23:02:28.800

4 Using Mean Squared Error in Gradient Descent 2018-06-14T20:44:07.030

4 Minimizing an upper bound of objective function 2018-07-19T07:11:38.880

4 What is the class of this optimization problem? 2018-09-05T05:22:59.050

4 Why imbalanced data-set will bias the prediction model towards the more common class? 2018-09-09T03:41:29.727

4 Bayesian optimisation in deeplearning 2019-01-09T18:54:25.567

4 Mathematical formulation of Support Vector Machines? 2019-07-12T21:47:57.940

4 Does convergence of loss function is always guarnteed? 2020-08-13T14:00:00.913

4 Need to kickstart learning rates 2020-11-04T04:13:59.473

3 Machine Learning for hedging/ portfolio optimization? 2014-12-18T04:48:49.820

3 Application of Control Theory in Data Science 2015-05-16T18:09:15.630

3 What are some nice algorithms/techniques for optimizing and predicting Click Through Rates (CTR)? 2015-10-26T11:50:41.147

3 Genetic Algorithm to find best parameter values of an estimaor 2015-11-28T19:36:54.963

3 Performance metric in recommender systems with implicit feedback 2015-12-08T17:49:04.497

3 Training the parameters of a Restricted Boltzman machine 2016-04-05T06:52:25.740

3 In Neural Nets, why Use Gradient Methods as Opposed to Other Metaheuristics? 2016-04-15T07:03:52.583

3 How can the process of hypertuning of XGBoost parameters be automated? 2016-06-25T08:46:21.680

3 DIfferent learning rates converging to same minima 2016-07-11T18:21:58.313

3 Minimize absolute values of errors instead of squares 2016-07-11T22:51:24.283

3 Efficient way to optimise hyper parameter for network with multiple inputs? 2017-04-30T13:14:48.587

3 What is the stochastic part in stochastic gradient descent? 2017-09-28T10:51:29.450

3 Sementic segmentation data and model compile in Keras 2018-02-11T17:45:40.577

3 What is a good objective function for allowing close to 0 predictions? 2018-04-16T13:25:40.237

3 Why RMSProp converges faster than Momentum? 2018-04-21T15:22:50.920

3 Grad Checking, verify by average? 2018-04-25T00:02:00.713

3 Why Root Finding is important in Logistic Regression? (i.e. Newton Raphson) 2018-05-02T09:09:06.717

3 Significance of comparing Receiver Operating Characteristic (ROC) curves 2018-06-27T18:24:31.383

3 How does binary cross entropy work? 2018-07-13T18:50:19.653

3 Optimizing Expensive Functions 2018-08-13T13:40:32.497

3 Linear optimization problem of $argmin$ 2019-01-16T10:35:30.587

3 Can Adagrad be used to optimize non-differentiable functions? 2019-02-16T20:05:03.550

3 Newton method and Vanishing Gradient 2019-03-20T14:41:27.870

3 Knowing when a GAN is overfitting (sequence classification study) 2019-04-05T07:48:19.513

3 Is Adam's optimization susceptible to Local Minima? 2019-04-07T16:09:16.733

3 Is it possible for a neural net to score as high as a different form of supervised learning? 2019-04-10T19:08:00.597

3 How to optimize the lambdas of a hybrid loss in a deep learning model 2019-05-13T14:19:18.240

3 How is Stochastic Gradient Descent used like Mini Batch gradient descent? 2019-06-01T14:17:40.917

3 Why does degradation occur in deep neural networks? 2019-09-02T21:53:07.500

3 Choosing an optimizer to perfectly fit a neural networks to training data 2019-09-15T00:15:02.403

3 Grid search or gradient descent? 2019-10-28T17:20:36.077

3 Optimization of pandas row iteration and summation 2019-11-19T16:40:20.970

3 Question on Scipy - Minimize. Adding additional constraints 2019-12-19T18:49:02.933

3 Ising Spin Glass - Optimization 2020-01-08T07:33:43.283

3 How to get the best combinations of features for a sale optimization problem? 2020-01-08T20:24:39.060

3 Machine learning model with simultaneous function optimization 2020-01-15T07:45:04.817

3 Constructing function - f(x,y) for the given minimums (Python) 2020-01-20T12:55:24.143

3 SGD versus Adam Optimization Clarification 2020-06-10T17:04:35.370

3 Difference between RMSProp and Momentum? 2020-06-21T15:53:11.807

3 Scipy minimization failing with inequality constraints or bounds 2020-07-09T00:25:10.070

3 When does it make sense to choose gradient descent for SVM over liblinear? 2020-08-24T19:53:17.037

3 Which learning rate should I choose? 2020-11-13T08:14:40.293

3 Comparison between cost functions to determine the "best" model? 2020-12-08T12:39:54.753

3 ML/NN as Function Evaluator for further Optimization (maximization) - Practical Example 2021-01-11T14:19:45.153

2 Using Heuristic Methods for AB Testing 2014-08-20T21:12:49.927

2 How does one feed graph optimization problems into Python's anneal function in SciPy? 2014-12-04T07:47:32.740

2 Machine learning for state-based transforms? 2015-07-11T23:08:14.300

2 Solve a pair of coupled nonlinear equations within certain limits 2015-09-13T21:50:39.237

2 How to select a bunch of optimized data from a larger data set? 2015-11-23T16:49:32.403

2 Machine learning worker performance features for optimum allocation of tasks to workers 2015-12-11T06:02:34.457

2 Regression problem - too complex for gradient descent 2015-12-15T13:25:39.840

2 Why is each successive tree in GBM fit on the negative gradient of the loss function? 2016-02-21T05:41:22.113