Is it possible to compress a sequence of numbers through an autoencoder?


Specifically: I would like compress a set of coordinates, which map to the locations of 1's in a binary image, and then decode back to the original set. For instance, for a 16x16 image, the input might be something like the following:

[5, 4], [12, 5], [8, 7],....

I am not looking to recognize any spatial patterns, nor is this a time series problem because the input corresponds to just one static image. The trained autoencoder should be able to handle any array of arbitrary "coordinates", under the assumption that the data is scaled between 0 to 1 so the resolution (actual range of numbers) is inconsequential. Is this doable? What would be a good way to train it?


Posted 2020-07-12T20:29:07.987

Reputation: 175



This is doable, but the caveat is that a neural network might not give you a better result than a lossless compression algorithm (depending on what you want to use the compressed version for).

The most sensible way to do it in my opinion is to have a function at the start that converts the coordinates into an image and a function at the end to convert the recovered image to coordinates again.

The way you would train it is to gather a large number of training examples, pass them as 16*16 tensors containing 1s and 0s to your autoencoder network, then use the reconstructed image to extract the coordinates where the value is 1.

There’s a good article/tutorial here to get you started.

If you’re determined not to explicitly give your network a 16*16 tensor you could pass your coordinates as an ordered set of two dimensional vectors and treat the problem as a sequence transduction task. The network would need to learn an embedding for each set of coordinates and then predict each embedding based on surrounding embeddings. I expect you won’t get much joy from a transformer model as they are notoriously hard to train, but a recurrent model with attention like the ones in the article I’ve linked to might give interesting results.

Nicholas James Bailey

Posted 2020-07-12T20:29:07.987

Reputation: 1 442

I have come across some image-related examples like that, but I am specifically interested in a version where the input is directly a set of coordinates, instead of passing in 1s and 0s at specific indices of a matrix. Is there any additional challenge in handling, say, encoding and decoding back to a set of floating point numbers? – HighVoltage – 2020-07-12T22:55:20.973

Updated my answer! – Nicholas James Bailey – 2020-07-13T05:58:35.977


If your images have only a few 1's, you would achieve a lossless compression without the need for any deep learning techniques by using sparse matrix storage formats like Compressed Sparse Row or others.


Posted 2020-07-12T20:29:07.987

Reputation: 448

Unfortunately that's not an assumption I can make. And also, I am interested in seeing what kind of 'features' an autoencoder would learn during the encoding as well, hence the neural network. – HighVoltage – 2020-07-12T22:51:42.153