There are several different algorithms that can be used for gradient free neural network training. Some of these algorithms include particle swarm optimization, genetic algorithms, simulated annealing, and several others. Almost any optimization algorithm can be used to train a neural network. Here is an overview of some of the algorithms I listed:

- Particle Swarm optimization - I would say that this is one of the better optimization algorithms to train neural networks other than back propagation. I am currently using it and have achieved quite good results.
- Genetic Algorithms - I have tried to use genetic algorithms to train neural networks in the past and I was not able to get it to work. However, I was using deep neural networks with almost a million parameters and the performance was not that good.
- Simulated annealing - simulated annealing is based off of metals cooling. I have seen simulated annealing work fairly well but maybe not as well as particle swarm optimization.
- Derivatives of genetic algorithms - derivatives of genetic algorithms such as NEAT have been shown to work pretty well. I have not personally used them extensively but some of the things that people have used them for are pretty cool.

Several algorithms are mentioned in this slide of class of M.Kochenderfer http://adl.stanford.edu/aa222/Lecture_Notes_files/chapter6_gradfree.pdf (p.s. The problem is more general then Deep Learning)

– bruziuz – 2018-01-30T13:53:03.703Generally this days ML community use gradient based methods. Even if they don't fully understand "why they do it" but it's because methods are robust and fast as proved by math scientists. p.s. And non-convex optimization is hard and non well solved. There are a lot of approximate methods how to deal with it - welcome to math optimization. – bruziuz – 2018-01-30T13:59:42.027