Assume I have a model which predicts the outcome of number of icecream sold in a store.
The model is trained on data for the last 5 years while keep the last year as a validation set and has produced very good results.
We now put the model into production such that the CFO can create an estimate for the upcomming year's budget. The CFO now look at the prediction for May, say 2000 ice creams, and thinks "Ooh... I was hoping for some more sale in May. I'll go 4000" thus he orders some more advertising, introduces new flavours etc. and reaches the 4000 sold ice cream at the end of May as he was hoping for.
The first of June we talk to the CFO to evalutate the model after the first 6 months, and we see that our prediction in May is off by 100%!
This spike can be explained with the increased advertising etc., and all the other days the model have done really well, but if the CFO starts tweaking the advertising, flavours etc. each day to hit the budget, how will we ever be able to test, if our model is indeed good in production/real-world? And how will we be able to re-train the model, since the first 5 years sale is without any "human influence" whereas after a year, the sale has been influenced by advertising etc., thus the spike in May is not "natural" but is due to some exogenous variable we are not able to incorporate (e.g we don't know the CFO's budget).