1D CNN Variational Autoencoder Conv1D Size

4

I am trying to create a 1D variational autoencoder to take in a 931x1 vector as input, but I have been having trouble with two things:

  1. Getting the output size of 931, since maxpooling and upsampling gives even sizes
  2. Getting the layer sizes proper

This is what I have so far. I added 0 padding on both sides of my input array before training (This is why you'll see h+2 for the input, 931+2 = 933), and then cropped the output to also get a 933 output size. Using 931 input gives 928 output which I am not sure what the best way to get 931 from there without cropping.


input_sig = Input(batch_shape=(w,h+2, 1))
x = Conv1D(8,3, activation='relu', padding='same',dilation_rate=2)(input_sig)
# x = ZeroPadding1D((2,1))(x)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(4,3, activation='relu', padding='same',dilation_rate=2)(x1)
x3 = MaxPooling1D(2)(x2)
x4 = AveragePooling1D()(x3)
flat = Flatten()(x4)
encoder = Dense(2)(flat)
x = encoder
z_mean = Dense(latent_dim, name="z_mean")(x)
z_log_var = Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = Model(input_sig, [z_mean, z_log_var, z], name="encoder")
encoder.summary()

latent_inputs = keras.Input(shape=(latent_dim,))
# d1 = Dense(464)(latent_inputs)
d1 = Dense(468)(latent_inputs)
# d2 = Reshape((117,4))(d1)
d2 = Reshape((117,4))(d1)
d3 = Conv1D(4,1,strides=1, activation='relu', padding='same')(d2)
d4 = UpSampling1D(2)(d3)
d5 = Conv1D(8,1,strides=1, activation='relu', padding='same')(d4)
d6 = UpSampling1D(2)(d5)
d7 = UpSampling1D(2)(d6)
d8 = Conv1D(1,1, strides=1, activation='sigmoid', padding='same')(d7)
decoded = Cropping1D(cropping=(1,2))(d8) # this is the added step

decoder = Model(latent_inputs, decoded, name="decoder")
decoder.summary()

This is the summary printed:

Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_99 (InputLayer)           [(1, 933, 1)]        0                                            
__________________________________________________________________________________________________
conv1d_209 (Conv1D)             (1, 933, 8)          32          input_99[0][0]                   
__________________________________________________________________________________________________
max_pooling1d_90 (MaxPooling1D) (1, 466, 8)          0           conv1d_209[0][0]                 
__________________________________________________________________________________________________
conv1d_210 (Conv1D)             (1, 466, 4)          100         max_pooling1d_90[0][0]           
__________________________________________________________________________________________________
max_pooling1d_91 (MaxPooling1D) (1, 233, 4)          0           conv1d_210[0][0]                 
__________________________________________________________________________________________________
average_pooling1d_45 (AveragePo (1, 116, 4)          0           max_pooling1d_91[0][0]           
__________________________________________________________________________________________________
flatten_45 (Flatten)            (1, 464)             0           average_pooling1d_45[0][0]       
__________________________________________________________________________________________________
dense_89 (Dense)                (1, 2)               930         flatten_45[0][0]                 
__________________________________________________________________________________________________
z_mean (Dense)                  (1, 2)               6           dense_89[0][0]                   
__________________________________________________________________________________________________
z_log_var (Dense)               (1, 2)               6           dense_89[0][0]                   
__________________________________________________________________________________________________
sampling_45 (Sampling)          (1, 2)               0           z_mean[0][0]                     
                                                                 z_log_var[0][0]                  
==================================================================================================
Total params: 1,074
Trainable params: 1,074
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_100 (InputLayer)       [(None, 2)]               0         
_________________________________________________________________
dense_90 (Dense)             (None, 468)               1404      
_________________________________________________________________
reshape_44 (Reshape)         (None, 117, 4)            0         
_________________________________________________________________
conv1d_211 (Conv1D)          (None, 117, 4)            20        
_________________________________________________________________
up_sampling1d_117 (UpSamplin (None, 234, 4)            0         
_________________________________________________________________
conv1d_212 (Conv1D)          (None, 234, 8)            40        
_________________________________________________________________
up_sampling1d_118 (UpSamplin (None, 468, 8)            0         
_________________________________________________________________
up_sampling1d_119 (UpSamplin (None, 936, 8)            0         
_________________________________________________________________
conv1d_213 (Conv1D)          (None, 936, 1)            9         
_________________________________________________________________
cropping1d_18 (Cropping1D)   (None, 933, 1)            0         
=================================================================
Total params: 1,473
Trainable params: 1,473
Non-trainable params: 0
______________________________

However when I try to fit my model I get the following exception:

ValueError: Invalid reduction dimension 2 for input with 2 dimensions. for '{{node Sum}} = Sum[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=false](Mean, Sum/reduction_indices)' with input shapes: [1,933], [2] and with computed input tensors: input[1] = <1 2>.

Anyone experience this error, or see what I am doing wrong in my model construction? I am new at this and not sure what I am doing wrong.

Note that I have modified this from a working 28x28 MNIST VAE from the Keras documentation.

Thanks in advance

Celeste Manu

Posted 2021-02-21T20:48:41.713

Reputation: 141

No answers