What are the best known gradient-free training methods for deep learning?



As I know, the current state of the art methods for training deep learning networks are variants of gradient descent or stochastic gradient descent.

What are the best known gradient-free training methods for deep learning (mostly in visual tasks context)?


Posted 2017-08-24T12:42:23.943

Reputation: 249

Several algorithms are mentioned in this slide of class of M.Kochenderfer http://adl.stanford.edu/aa222/Lecture_Notes_files/chapter6_gradfree.pdf (p.s. The problem is more general then Deep Learning)

– bruziuz – 2018-01-30T13:53:03.703

Generally this days ML community use gradient based methods. Even if they don't fully understand "why they do it" but it's because methods are robust and fast as proved by math scientists. p.s. And non-convex optimization is hard and non well solved. There are a lot of approximate methods how to deal with it - welcome to math optimization. – bruziuz – 2018-01-30T13:59:42.027



There are several different algorithms that can be used for gradient free neural network training. Some of these algorithms include particle swarm optimization, genetic algorithms, simulated annealing, and several others. Almost any optimization algorithm can be used to train a neural network. Here is an overview of some of the algorithms I listed:

  • Particle Swarm optimization - I would say that this is one of the better optimization algorithms to train neural networks other than back propagation. I am currently using it and have achieved quite good results.
  • Genetic Algorithms - I have tried to use genetic algorithms to train neural networks in the past and I was not able to get it to work. However, I was using deep neural networks with almost a million parameters and the performance was not that good.
  • Simulated annealing - simulated annealing is based off of metals cooling. I have seen simulated annealing work fairly well but maybe not as well as particle swarm optimization.
  • Derivatives of genetic algorithms - derivatives of genetic algorithms such as NEAT have been shown to work pretty well. I have not personally used them extensively but some of the things that people have used them for are pretty cool.

Aiden Grossman

Posted 2017-08-24T12:42:23.943

Reputation: 824

I don't think you need to distinguish between "genetic algorithms" and stuff like NEAT, which is not a "derivative" of a genetic algorithm, it's a GA! – nbro – 2020-03-09T15:01:00.133

Do you have maybe some SW recommendations? (If this is beyond this question, please let me know) – rursw1 – 2017-09-24T06:50:58.567

1@rursw1 I am currently experimenting with my own custom GPU accelerated CUDA c/C# PSO implementation, but MATLAB would probably be good for this, and there is a library for NEAT. If you want to use my PSO implementation, just let me know. – Aiden Grossman – 2017-09-24T18:41:57.467

This is very generous, thank you. I'll try to implement a PSO implementation on my own for now. Thanks! – rursw1 – 2017-09-25T06:24:23.337