## Common Techniques to Generate from a Regression Neural Network Model

4

2

I am used to train neural networks that are designed for generation, such as GANs or VAEs.

I am wondering what are the common techniques to generate data that would minimize the target/energy learned by a regression model, following the idea of Deep Dream.

I can think of two ways :

1) Use the trained regression neural network as a loss function (with its gradients) for another neural network that is trained to produce structures that produce a given energy / target, as given by the first neural network.

2) Use an standard optimization algorithm (not a neural network) to find which inputs minimize the output of the regression model.

Are there any other common methods to do this ? What are the most know / effective methods ?

Any idea / reference would be great !

Regarding "I know it is possible to train a regression model on a dataset and then 'reverse' the process" - could you clarify where you know this from? Because I would say that this is not possible in general. Also, are you hoping (as with a GAN or VAE) to generate input data according to observed distribution of inputs during training, or just any input that results in the target output, even if it would not represent a valid member of the input population? – Neil Slater – 2018-06-05T15:41:43.033

@NeilSlater For the reverse process, I've seen some thesis work solving the optimization problem of 'which input will make the NN output this real value' after the NN was trained on the dataset, I don't have the reference right now though.. (And I am not sure when and if this is a good idea in general, thus my question !) For your second question, it would be great to generate data that would correspond to a fixed, specified output of the NN though I don't see how this would be possible if it is trained on a restricted dataset. So I would say first generate data according to distribution. – Tool – 2018-06-05T18:48:16.677

Actually it's the other way around - generating from the distribution is hard (impossible for a normal feed-forward network and no access to the training data). Generating inputs that trigger the correct output is similar idea to Deep Dream. You may be able to spot something meaningful in the input values, but they probably won't correspond to "realistic" inputs. – Neil Slater – 2018-06-05T19:41:10.293

Sorry, another question if you don't mind: What do you mean by "external" for optimisation procedure? Would using the built-in gradient descent for training an NN be considered "external"? That is basically how Deep Dream works. – Neil Slater – 2018-06-05T20:00:21.760

@NeilSlater So I'd consider external something that is not inherent to the NN, so yes Deep dream would be external as it only uses the gradients but the generation is not 'included' in the NN's training nor its architecture. Sorry if I'm being vague, I'm just looking for general recommendations / directions to look in. Thanks ! – Tool – 2018-06-05T21:13:43.423

2

Let $$E_\phi : D\rightarrow \mathbb{R}$$ be your trained differentiable regression model, where $$D$$ is the data space, e.g. images. Let $$G : \mathbb{R}^d\rightarrow D$$ be some generative model or decoder from a latent space (e.g., GAN or decoder from a VAE).

Suppose we want to find $$x\in D$$ such that $$x = \min_y E_\phi(y)$$. Then there are two obvious ways:

1. Gradient descent in the data space: i.e., solve $$x = \min_y E_\phi(y)$$ via iterating $$x_t = x_{t-1} - \eta\nabla E_\phi(x_{t-1})$$.

2. Gradient descent in the latent space: i.e., solve $$z = \min_u E_\phi(G(u))$$ via iterating $$z_t = z_{t-1} - \eta\nabla (E_\phi(G(z_{t-1})))$$.

Both are very common. The main benefit of (2) is that (a) you are more likely to get a "reasonable" $$x=G(z)$$ because the generator $$G$$ was trained to give you one (i.e., $$x$$ should look like it was from $$D$$, which is not guaranteed in (1)), (b) $$d$$ is often lower dimensional than $$|D|$$, making the optimization easier, and (c) it may be impossible to even do (1) (for instance, in molecule generation, we can do (2) but not (1) due to the discreteness problem). Nevertheless, there is a large drawback: the need to learn $$G$$, which is itself non-trivial to obtain. (Hence why for e.g. generating adversarial examples, (1) is more popular.)

One might also want to avoid gradient-based optimization in some cases, in which case I often see Bayesian optimization approaches being used.