ngram and RNN prediction rate wrt word index

12

I tried to plot the rate of correct predictions (for the top 1 shortlist) with relation to the word's position in sentence : enter image description here

I was expecting to see a plateau sooner on the ngram setup since it needless context. However, one thing I wasn't expecting was that the prediction rate drops. In my understanding since we already have a context of 3 words, the plateau should converge asymptotically to its highest value. But both the recurrent network and the Ngram models are experiencing this drop. I have no idea why it would be.

(Note RNNLM is the name of the framework used to build the recurrent neural net, it uses 500 neurons and 100M direct connections, RNN25 is the same setup but with a training base divided by for)

Here is the sentence size distribution : enter image description here

Thanks in advance.

Arkantus

Posted 2015-10-27T09:55:31.540

Reputation: 157

3I think it would help if you would add a paragraph clarifying the objective and approach of your model. – Sledge – 2019-04-24T01:31:51.573

2This question (even though it was upvoted a lot) could be improved: The question is unclear and the situatio/models could benefit from some elaboration. – S van Balen – 2019-08-09T15:17:21.540

Can you add error bars to get a sense of variation in the predictions? The drop in performance may be a function of noise? – Brian Spiering – 2019-08-18T23:00:28.060

What is the x axis on your first graph? Position of word in sentence? – Neil Slater – 2015-10-27T13:49:30.910

Yes it is ! Same as the second graph – Arkantus – 2015-10-27T14:19:50.060

Answers

0

Recurrent Neural Network (RNN) create a single state vector over time. Thus that curve is to be expected. Initially, the state vector does have enough information to make a quality prediction. Then quickly reaches asymptotic performances. Overall, the predictions are between 15% and 22% correct.

The shape of the graph might be a function of sentence length in the training corpus. Possibly sentences could be between 3 and 7 words long. The drop could be because there is less training data for longer sentences.

Brian Spiering

Posted 2015-10-27T09:55:31.540

Reputation: 10 864

if I understand this answer correctly it would be best if the transitive and steady-state responses from systems theory is referenced as well in order to interpret the graphs – Nikos M. – 2021-01-04T14:00:39.783