11 What are Hyper-heuristics? 2016-08-27T13:15:19.897

11 Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch 2018-09-20T13:14:32.060

10 Can artificial intelligence be thought of as optimization? 2016-08-02T17:56:02.743

10 What are the implications of the "No Free Lunch" theorem for machine learning? 2019-09-27T13:52:15.007

9 What are the limitations of the hill climbing algorithm and how to overcome them? 2018-11-15T15:03:08.543

8 Why does 'loss' change depending on the number of epochs chosen? 2017-12-07T14:32:11.810

8 Why does a one-layer hidden network get more robust to poor initialization with growing number of hidden neurons? 2018-04-05T08:59:43.157

7 Why number of hidden units in a layer are suggested to be in powers of 2? 2018-02-22T16:56:56.350

7 How much can the addition of new features improve the performance? 2018-10-30T20:38:16.187

5 How to avoid falling into the "local minima" trap? 2016-08-05T10:39:31.520

5 Reproduce Firefly Algorithm experiments of original paper? 2017-02-13T13:57:39.303

5 What is the actual learning algorithm: back-propagation or gradient descent? 2018-11-13T23:49:24.553

5 How can we use linear programming to solve an MDP? 2019-03-14T21:12:04.403

5 When should we use algorithms like Adam as opposed to SGD? 2019-03-25T22:46:28.447

5 How can we conclude that an optimization algorithm is better than another one 2019-09-22T15:46:14.707

4 How to use MOPSO to align characters vertically? 2018-03-28T14:56:34.707

4 What is the basic purpose of local search methods? 2018-11-19T16:06:10.987

4 Can we use the Tierra approach to optimize machine code? 2019-06-27T11:22:10.720

4 Can we optimize an optimization algorithm? 2019-07-23T22:02:12.300

4 What are examples of optimization problems that can be solved using a genetic algorithm? 2019-10-05T13:53:10.597

4 What effect does batch norm have on the gradient? 2020-03-27T18:28:55.237

3 What are the methods of optimizing overfitted models? 2016-08-02T15:55:15.957

3 Is there a way to define the boundaries of the optimal size of a training set? 2016-08-04T15:49:13.793

3 Maximizing or Minimizing in Trust Region Policy Optimization? 2018-07-15T08:43:31.690

3 Genetic Algorithms: Trade-off between time and variance with regards to fitness function 2018-08-16T10:07:42.763

3 Why does hill climbing algorithm only produce a local maximum? 2018-11-27T06:20:54.403

3 When should I use simulated annealing as opposed to a genetic algorithm? 2018-12-12T07:13:17.103

3 Could error surface shape be useful to detect which local minima is better for generalization? 2019-03-01T20:46:51.720

3 Is it possible to have a dynamic $Q$-function? 2019-07-18T20:57:51.023

3 Does Retina-net's focal loss accomplish its goal? 2019-08-03T14:53:51.103

3 Advantages of Kullback-Leibler over L1/L2? 2019-09-11T06:49:49.830

3 What's the difference between RMSE and Euclidean distance, and when to use a custom loss? 2019-11-15T07:27:21.777

3 Why does variational auto-encoder use the reconstruction loss? 2020-03-26T05:22:29.500

3 How does SGD escape local minima? 2020-05-31T09:56:07.290

2 Are FFNN (MLP) Lipschitz functions? 2016-09-10T10:05:34.707

2 Would a sentient AI try to create a more optimised AI which would eventually overtake AI 1.0? 2016-09-15T14:08:40.983

2 Knapsack of mixture with constraints 2016-11-14T08:55:39.600

2 Problems getting ADADELTA to converge 2017-08-23T23:40:45.230

2 Which features and algorithm could optimize this air-conditioner problem? 2018-01-09T18:26:56.430

2 Input optimization on a supervised learning system 2018-04-23T10:29:22.173

2 If Deep Learning is non convex, then why use convex loss? 2018-05-23T03:57:53.923

2 How can we calculate the gradient of the Boltzmann policy over reward function? 2018-07-14T13:27:12.740

2 How do I compute log-likelihood for training set in supervised learning? 2018-08-08T05:52:01.567

2 Is a calculus or ML approach to varying learning rate as a function of loss and epoch been investigated? 2018-09-20T21:33:32.960

2 AI that maximizes the storage of rectangular parallelepipeds in a bigger parallelepiped 2018-10-19T14:50:54.090

2 Reinforcement Learning to Grouped Scheduling Optimisation Problem 2018-11-15T16:27:40.773

2 Method to check goodness of combinatorial optimization algorithm implementation 2019-01-17T12:23:23.670

2 Is a very powerful oracle sufficient to trigger the AI singularity? 2019-02-17T21:58:59.010

2 Training an artificial neural network using PSO 2019-03-03T13:36:06.670

2 How can we reach global optimum? 2019-04-09T05:10:55.027

2 Why is a mix of greedy and random usually "best" for stochastic local search? 2019-05-31T06:06:22.240

2 Is a neural network the correct approach to optimising a fitness function in a genetic algorithm? 2019-07-09T18:10:01.303

2 why the sigmoid function will be 1 and 0 if we use a fully connected layer that produce a big enough positive(res negative )output 2019-09-11T13:14:31.460

2 Metrics of quality of parameter space exploration 2019-09-16T07:58:02.007

2 How can I assign agents to tasks based on time and affinity? 2019-11-14T01:40:36.587

2 Which algorithm can I use to solve a problem with multiple objectives and constraints? 2019-11-19T14:31:30.013

2 Is logistic regression used for unconstrained or constrained optimisation problems? 2019-12-04T05:06:35.627

2 When training a CNN, what are the hyperparameters to tune first? 2020-01-15T09:04:46.550

2 Solving a planning if finding the goal state is part of the problem 2020-01-16T18:11:38.077

2 In deep learning, is it possible to use discontinuous activation functions? 2020-01-22T04:40:47.477

2 What are advantages of using meta-heuristic algorithms on optimization problems? 2020-01-27T12:19:58.037

2 Which deep reinforcement learning algorithm is appropriate for my problem? 2020-03-30T08:17:39.780

2 How can I train a neural network if I don't have enough data? 2020-04-03T15:01:23.627

2 Which one is more important in case of different loss optimization algorithms, Speed or the Route? 2020-05-28T08:07:50.917

2 If the normal equation works, why do we need gradient descent? 2020-07-08T14:15:23.210

1 Which algorithm would you use to solve a multiple producer-consumer problem with constraints? 2017-12-11T15:53:38.130

1 Should the mutation be applied with the hill climbing algorithm? 2018-01-19T15:07:30.837

1 Application of Ai to task scheduling problems on heterogenous platforms 2018-01-30T23:02:22.490

1 How to calculate Adaptive gradient? 2018-02-19T16:54:33.217

1 Optimization step in Apprenticeship Learning via Inverse Reinforcement Learning 2018-08-08T05:16:04.270

1 What is the difference between the study of evolutionary algorithms and optimization? 2018-09-24T04:06:26.310

1 How to optimize a function using a genetic algorithm? 2018-09-29T06:36:21.413

1 What are the advantages and disadvantages of using LISP for constraint satisfaction in 3D space 2018-10-05T19:59:52.837

1 What does it essentially mean if the neural network has convex error surface? 2018-11-23T23:36:44.133

1 Genetic Algorithm vs Particle Swarm Optimization 2018-11-30T10:26:11.017

1 Neural Network Optimizers in Reinforcement Learning non-well behaved environments 2018-12-01T08:53:47.123

1 Classical Internet routing vs. Swarm routing (such as Ant routing)? 2018-12-13T13:27:52.433

1 How can a specific connectivity pattern be stored in an optimally compact representation? 2018-12-19T20:58:02.760

1 Feature visualization on neural networks which are not for classification 2019-02-14T09:31:25.837

1 Any guidance on learning rate / batch size for noisy data (high Bayes error rate)? 2019-02-22T09:07:45.573

1 Which local minima to choose according to the shape of the error surface? 2019-03-01T21:47:19.977

1 What is the purpose of the new neurons in the constrained neural network? 2019-03-05T20:39:26.673

1 Why isn't the reverse KL divergence commonly used in supervised learning? 2019-04-05T09:08:56.710

1 How to properly optimize shared network between actor and critic? 2019-04-25T12:56:02.830

1 Estimating Baselines using ALS 2019-05-05T00:11:04.380

1 How does NEAT find the most successful generation without gradients? 2019-06-04T19:53:37.460

1 Is an optimization algorithm equivalent to a neural network? 2019-07-23T11:35:34.260

1 Could the Jensen-Shannon divergence and Kullback-Leibler divergence be used as loss functions of non-generation problems? 2019-07-24T13:07:19.477

1 Is convergence to a local minima more likely with transfer learning? 2019-07-25T09:28:02.890

1 Create optimizer object using the tf.keras.optimizers.get function 2019-08-21T09:34:52.867

1 Grey Wolf Optimization - Issue with Dimension 2019-10-29T17:24:02.443

1 How does the automated temperature adjustment step work in Soft Actor-Critic? 2019-11-02T01:22:26.637

1 How should I weight the factors that affect the choice of an action in a strategy board game with multiple actions? 2019-12-08T13:41:28.590

1 Imposing contraints on sequence of image classifications 2019-12-16T19:53:07.690

1 Optimizer effects on neural network with two outputs 2019-12-17T21:22:35.997

1 Are there optimizers that schedule their learning rate, momentum etc. autonomously? 2020-02-06T18:43:30.983

1 What are the properties of hill climbing? 2020-02-19T15:19:55.500

1 What kind of optimizer is suggested to use for binary classification of similar images? 2020-02-24T08:11:59.917

1 Object Detection and Choice of Optimizer 2020-02-24T23:08:04.607