3

1

I'm reading the following kaggle post for learning how to incorporate model stacking

http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ in ML models. The structure behind constructing the 5 folds and creating out of sample predictions on the training data makes sense for the purpose of building the meta model or the model on top of the base models. However i'm not sure how it uses hyper parameter tuning especially for the base models.

So the concept of getting out of sample predictions makes sense to me. We essentially for each of the 5 folds use the other 4 folds to train and then predict on the fifth. So how do we actually hyper parameter tune the base models on this same dataset without adding bias, it's seems to me that this is not possible?

Note i'm making the assumption that there is no more data available to use. I'd appreciate any help!

I guess a usual strategy is to try different extreme versions of the hyperparameters, e.g. one model with a very large value in one parameters and very small value in anotoher, then the same model but with the opposite type of parameters. Tuning hyperparameters conjointly would be a reeeeaaally slow process and I've never heard of that being done, but you can still try optimization-based approaches such as gaussian process or bayesian optimization that would work with the parameters of all models simultaneously. A simple grid search would likely be intractable. – anymous.asker – 2018-11-17T18:43:07.437