## What are the best ways to tune multiple parameters?

2

2

When building a model in Machine Learning, it's more than common to have several "parameters" (I'm thinking of real parameter like the step of gradient descent, or things like features) to tune. We validate these parameters on a validating set.

My question is: what is the best way of tuning these multiple parameters? For example, let say we have 3 parameters A, B and C that take 3 values each:

• A = [ A1, A2, A3 ]
• B = [ B1, B2, B3 ]
• C = [ C1, C2, C3 ]

Two methods come to my mind.

Method 1:

Vary all the parameters at the same time and test different combinations randomly, such as:

• Test1 = [A1,B1,C1]
• Test2 = [A2,B2,C2]
• Test3 = [A3,B3,C3]
• Test4 = [A1,B2,C3]
• Test5= [A3,B1,C2]
• etc..

Method 2:

Fix all the parameters except one: - TestA1 = [A1,B1,C1] - TestA2 = [A2,B1,C1] - TestA3 = [A3,B1,C1] In that way, we can find the best value for parameter A, then we fix this value and use it to find the best value for B, and finally the best for C.

It seems more logical to me to use the method 2 which seems more organized. But may be we will miss a combination which can be found only in method 1 that doesnâ€™t appear in method 2, such as [A1,B2,C3] for example.

Which method is the best? Is there another method more accurate for tuning multiple parameters?

Regards.

6

Generally people perform a grid search, which in its simplest "exhaustive" form is similar to Method 1. However there are also more 'intelligent' ways to choose what to explore, which optimize in parameter space in a fashion similar to how each individual model is optimized. It can be tricky to do greedy optimization in this space, as it is often strongly non-convex.

3

I just want to add more information about these more "intelligent" ways to pick hyperparameters. One in particular that's becoming more and more popular. BAYESIAN OPTIMIZATION OF MACHINE LEARNING ALGORITHMS by Jasper Snoek, Hugo Larochelle and Ryan P. Adam. Which has proven effective in algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

Algorithm on Github https://github.com/JasperSnoek/spearmint

This method is particularly useful when the algorithm is complicated (leaning towards a black box model) and/or the data is massive and it's computationally expensive to train several times a model.

1

In the vein of bayesian optimization, I prefer Hyperopt, available on github at https://github.com/hyperopt/hyperopt or through pip, homepage of author at https://github.com/hyperopt/hyperopt. The tree-parzen-estimator algorithm behind it is described in the paper at http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf .

You can define an arbitrary nested search space, and then tell it to find the optimum of a black-box function.

Hyperopt also supports a built-in sklearn search space out of the box, but I have not used that, I've typically defined my own.

There's also a few other bayesian optimization schemes around. I can't claim to know which one of them will come out on top as the "best" in the long run.