I was expecting to see a plateau sooner on the ngram setup since it needless context. However, one thing I wasn't expecting was that the prediction rate drops. In my understanding since we already have a context of 3 words, the plateau should converge asymptotically to its highest value. But both the recurrent network and the Ngram models are experiencing this drop. I have no idea why it would be.
(Note RNNLM is the name of the framework used to build the recurrent neural net, it uses 500 neurons and 100M direct connections, RNN25 is the same setup but with a training base divided by for)
Thanks in advance.