This is a theoretical question. I am newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about evaluation of language models (I am focused on ASR), but I still don't get the concept of development test. The clearest explanation I have come across is the following

"Sometimes we use a particular test set so often that we implicitly tune to its 
characteristics. We then need a fresh test set that is truly unseen. In such cases, 
we call the initial test set the development test set or, devset"

Nevetheless I have not found sense as for why an additional test has to be used. In other words, why aren't training and test sets enough?

In machine learning you normaly split your data into 3 parts(80-10-10%). First part is for the training of your ML-model. The second part (10%) is the development set (or validation set). This is used as measuring your performance with various hyper parameters (e.g. in neural networks: layer size). After you found your best hyper parameters, you learn the model again on the test set and then test it on your test data (10%) which the model has never seen before. Your result on the test data is now a good indicator how your model prediction quality is in the real worl (because it was never optimized for this test data).


