4

I'm wondering, has anyone seen any paper where one trains a network but biases it to produce similar outputs to a given model (such as one given from expert opinion or it being a previously trained network).

*Formally, I'm looking for a paper doing the following:*

Let $g:\mathbb{R}^d\rightarrow \mathbb{R}^D$ be a model (not necessarily, but possibly, a neural network) trained on some input/output data pairs $\{(x_n,y_n)\}_{n=1}^N$ and train a neural network $f_{\theta}(\cdot)$ on $$ \underset{\theta}{\operatorname{argmin}}\sum_{n=1}^N \left\| f_{\theta}(x_n) - y_n \right\| + \lambda \left\| f_{\theta}(x_n) - g(x_n) \right\|, $$ where $\theta$ represents all the trainable weight and bias parameters of the network $f_{\theta}(\cdot)$.

So put another way...$f_{\theta}(\cdot)$ is being regularized by the outputs of another model...

1This reminds me of the ELBO objective function, which is composed of two terms: one is the likelihood and the other the KL divergence between the variational density and the prior. Why does your problem/question remind me of the ELBO? Because, essentially, the prior here would be your previously trained model while the variational density is what you are looking. You might want to have a look at it for inspiration (at least). But what's the problem with your current approach of using some kind of norm to regularize? – nbro – 2020-07-24T01:07:10.040

@nbro I'll have a look at the ELBO objective function; thanks I also had some type of pre-training idea in mind. To answer your question, basically I was able to show universality of certain models arising from this type of regularization so I'm looking to connect it to pre-existing literature. – AnnieLeKatsu – 2020-07-24T06:24:40.800

Well, actually, this also reminds me of the GAN. See https://deepgenerativemodels.github.io/notes/gan/.

– nbro – 2020-07-24T19:51:04.400@nbro I gave this some more thought and is this related to transfer learning, where, the second term dictates how for a NN departs from a previously trained model? – AnnieLeKatsu – 2020-07-27T11:58:15.110

This is vaguely related to transfer learning: in transfer learning (at least what people usually mean by that term) you don't jointly optimize one or two models, but you first train a model with dataset 1, then you train it with dataset 2. As I said above, I think you're looking more for something like a GAN or along the lines of a GAN. In the GAN, you will see there are two models and they are conceptually trained jointly, which is what I think you want, if I understand correctly. – nbro – 2020-07-27T13:27:26.750

@nbro Yes but in the GAN formulations I've seen there's always a minimax problem to solve where here's theres only a min involving an exjogenous model. Have you seen a GAN formulation where there is no minimax (This would be perfect). – AnnieLeKatsu – 2020-07-27T13:56:58.940

1No, I have not seen it, but I just wanted to point you the idea behind the GAN, where there are two models. Of course, you will have to change the objective function to do what you want. – nbro – 2020-07-27T13:57:46.183