Compressing text using AI by sending only prediction rank of next word

1

Is there any effort made to compress text (and maybe other media) using prediction of next word and thus sending only the order number of the word/token which will be predicted on the client side i.e
Server text: This is an example of a long text example, custom word flerfom inserted to confuse, that may appear on somewhere
Compressed Text transmitted : This [choice no 3] [choice no 4] [choice no 1] [choice no 6] [choice no 1] [choice no 3] [choice no 1], custom word flerfom [choice no 4] inserted [choice no 4] confuse [choice no 5] [choice no 4] [choice no 6] [choice no 5] on somewhere

(Note: of course [choice no 3] will be shortened to [3] to save bytes and also maybe we can do much better in some cases by sending the first letter of the word)

of course it means that the client side neural network has to be static or only updated in a predictable fasion, so the server knows for sure that the client neural network's predictions will follow the given choice orders. I tried example with https://demo.allennlp.org/next-token-lm, but the prediction is not that good. maybe gpt-3 can do better . but its too heavy for use in a normal pc / mobile device

In more details, the process is

Deploy the same model on both sides
Predict the next word after the starting word
Keep the prediction limit say 100
For any word which have more than 2 characters we do the prediction
If the current word is predicted within the top 100 predictions of the model , we can essentially replace it with a numeric char between 0-99 (inclusive) so we are replacing a say , 5 character word with a 2 character numerical char..
And if the word is not predicted in top 100 we send the word as it is..
As much better the model predicts, that much better the compression
And under no scenario it will work worse than the existing method..

sktguha

Posted 2020-08-26T12:11:58.467

Reputation: 11

So, essentially, the twin models have complete information on the decision making process of the the other? It's an interesting idea, and, in theory, should work. (One issue could be transmission noise, but that can be mitigated with redundant signals per information theory.)

– DukeZhou – 2020-08-27T00:07:49.587

@DukeZhou well yes. I mean if you have same weights and other things same, then predictions should be always same right ? – sktguha – 2020-08-27T09:38:40.967

Answers

1

If you have a fixed predictor, then yes. If the predictor is not fixed but deterministic, the feasibility depends on the effort needed to update the predictor, and ensuring that messages include a time stamp to ensure the correct version of the predictor is used for compression and inflation.

You get a really nice property if the prediction order is in order of probability that the word occurs, and if you use fewer bits for lower numbers. You would get something that is pretty close to Shannon coding, which is not optimal but is still valid.

Robby Goetschalckx

Posted 2020-08-26T12:11:58.467

Reputation: 286

Thanks for the reply. I guess the main challenge is to make it fast yet, effective. i guess having a limit of 100 predictions can work. I wonder if there is any research , research paper or program/demos etc done for this concept. – sktguha – 2020-08-26T13:43:32.750

In more detail the approach is

Deploy the same model on both sides Predict the next word after the starting word Keep the prediction limit say 100 For any word which have more than 2 characters we do the prediction If the current word is predicted within the top 100 predictions of the model , we can essentially replace it with a numeric char between 0-99 (inclusive) – sktguha – 2020-08-26T13:45:08.193