Predict output sequence one at a time with feedback



I would like to solve the following classification problem: Given an input sequence and zero or more initial terms of the true output, predict the next term in the output sequence.

For example, my training set is a large corpus of question and answer pairs. Now I would like to predict the response to "How are you?" one word at a time.

Some reasonable responses might be "I'm fine.", "OK.", "Very well, thanks." etc. So the predicted distribution over possible first words would include the first words in those examples, among others.

But now if I see the first word is "Very", I would like my prediction for the second word to reflect that "OK" is now less likely, because "Very OK" is not a common response.

And I would like to repeat this process of predicting the next word given the input sentence and every word so far in the output sentence.

One approach might be to train a stateful RNN on examples like How are you<END> I am fine<END>. But if I understand correctly this would also learn to predict words in the "How are you" part, which I think might detract from the goal.

Maybe separate layers for the input and partial output that are then merged into a decoder layer?


Posted 2018-08-20T06:34:06.920

Reputation: 2 301



After some more research, I believe this can be solved with a straightforward application of encoder-decoder networks.

In this tutorial, we can simply replace sampled_token_index and sampled_char with actual_token_index and actual_char, computed according. And of course in our case it's actual_word.

To summarize we divide our training set into input/output pairs, where output examples begin with a <START> token and end with <STOP>, and train a sequence to sequence model on these pairs, as described in the tutorial.

Then at inference time we feed the <START> token to the model to predict the next word. Then after we receive the actual next word, we feed the actual (observed) output so far into the model, and so on.

enter image description here The other answers had some interesting ideas, but unfortunately they didn't really address how to deal with variable length inputs and outputs. I believe seq2seq models based on RNNs are the best way to address this.


Posted 2018-08-20T06:34:06.920

Reputation: 2 301


I assume, that your data looks somehow like this (a pair of question and answer)

How are you?,I'm fine.
How are you?,OK.
How are you?,Very well, thanks.

I'd suggest to transform it in three features: Question (remains the same), InputSeq is the prefix of the answer generated in all possible lengths between zero and the number of words minus one.

Finally the NextWord is the expected result.

So for your initial data you ends with the following setup.

Question, InputSeq, NextWord
How are you,,I'm
How are you,I'm,fine
How are you,,OK
How are you,,Very
How are you,Very,well
How are you,Very well,thanks

The model takes Question and InputSeq as features and predicts the NextWord.

Marmite Bomber

Posted 2018-08-20T06:34:06.920

Reputation: 1 073

Thanks. This looks like a promising approach. I thought about this already a bit and I was struggling with how best to tell the network where the input sequence ends and where the prefix sequence begins. For example "How are you today?" Should not be treated the same as "How are you?" followed by a response that starts with "Today". eg "Today I am fine." Is this as simple as having a special token to indicate the end of my input sequence? – Imran – 2018-08-27T15:53:29.003

Can you also elaborate on what type of model you have in mind? I was struggling to decide whether this approach would make sense with an RNN, because I thought it wasn't standard to give RNNs successively longer prefixes like this. – Imran – 2018-08-27T15:57:18.313

No, you need not a special token to indicate the end of input sequence, the question and the prefix of the answer are defined as two distinct features. – Marmite Bomber – 2018-08-27T17:27:34.067

Ok how to have two features of variable length then? What type of model would take these? – Imran – 2018-08-27T17:32:52.367

Frankly said I have no idea which model should be used for your setup. Intuitlively I'd try simplest model and over-fit to the maximum. How do you expect the answer to the question ´"What time is it?" should be generalized? – Marmite Bomber – 2018-08-28T17:51:04.163

Unfortunately I can't think of any model that will fit the feature encoding you have suggested. An RNN is the only model I know of that takes a variable-length input sequence - but it takes only one input sequence, not two. – Imran – 2018-08-29T03:29:09.367


How would a high density and low tree depth random forest perform on this? I think it should be very good (if not the best) approach for this task. You can simply use parts of your possible answers also as attributes during the training process.


Posted 2018-08-20T06:34:06.920

Reputation: 445

Interesting idea. Can you elaborate on how you want to encode the inputs and feed them to the random forest? – Imran – 2018-08-30T18:22:14.897