Reconstructing an audio signal from its Mel-scale spectrogram using an autoencoder


I'm looking for some papers/references that attempted to reconstruct audio signals from their Mel-scale spectrograms using an autoencoder or other neural network.

I am thinking of training the autoencoder by computing the Mel-scale spectrogram of the input audio signal, passing the Mel-scale spectrogram through the autoencoder such that the dimensions of the output layer exactly matches the dimensions of the input audio signal, and then updating the parameters of the autoencoder based on the MSE loss between the output layer and the input audio signal.

I would also appreciate any suggestions.


Posted 2021-01-21T20:20:32.130

Reputation: 131

