Conditional Variational Autoencoder - NON Image Data


First I would like to expand an issue I've been dealing with way too long: Creating a conditional Variational Autoencoder with continuous variables in non-image data ( more specifically, time series). Given one time series, I want to condition it in a continuous number, not discrete.

Yes, I know that theoretically is possible and do understand the ELBO formulation as written in (Is there a Continuous Conditional Variational Autoencoder?) and (

However, from the ELBO formulation to an actual implementation I believe there is some gap since my results aren't great. I manage to create the autoencoder, but, the decoder ignores the additional information and produces the whole spectrum of the additional information in the generated data, that is, it works just like a VAE).

In images and with discrete labels there are tutorials such as ( pointing out how to concatenate the additional information.

I tried two approaches for concatenating the additional information: Just by concatenating the single continuous number in one of the possible architectures below. The second was by expanding the additional information with a linear/dense layer ( For example from 1D to 10D. My reasoning for trying this too was: Given an intermediate layer, let's say with 300 hidden neurons if I concatenate ONE additional number/information to it, is the network REALLY going to tell any difference and properly backpropagate towards its correction?)

  1. A vanilla CVAE: I concatenate additional information after one pass of compressing the time series into an intermediate lower dimension. In the decoder, I concatenate additional information in the sampled data before it is expanded to intermediate hidden dimensions and later to the final, original dimension.

  2. A GRU CVAE: I concatenate the additional information after one pass of a first GRU layer.

  3. CNN-CVAE: I concatenate additional information after all convolution passes, just before it is processed by linear layers.

In all of these, the C-VAE does manage to generate output (as VAE) but ignoring the conditional vector. I also got suspicious of the posterior collapse issue and tried to follow what is recommended in Fixing the Broken ELBO (, by adding a sigma term to force/control the rate term in the loss function.

Traing-Val loss

Distortion loss: Here I used simple mean squared error as reconstruction error

Rate loss: Simple KLD divergence loss form a normal distribution

Because I am getting a little frustrated by spending so much time in this, I was wondering if anyone could help by pointing out directions, papers or explaining how you actually dealt with the issue in past or current experience of conditioning (implementing in code) additional information of continuous nature to time series in a VAE framework. I would appreciate it!


Posted 2020-06-03T08:57:50.683

Reputation: 1

No answers