I have a complex algorithm that decides when it should show customers of an only shop an ad on our website, after they log in, in hope that they will buy what is in the ad. We have no control what is in the ad, because another department of our company handles that - our model only chooses to whom the ad is shown (we would like to keep the number of people to whom we show the ad as low as possible, because we will just offend customers with the ad otherwise; this means don't show the ad to people who log in with an intent to buy something, only show it to those that would not buy anything if they weren't presented an ad). The model was training using training data where we randomly showed some people ads and some not and recorded their respective response.
An initial implementation of the model is running live, and works decently well in classifying to whom it makes sense to show the ad. But each day new data is acquired and the model is retrained using the whole dataset (which with each retraining gets larger) every few days. But there is the worry that its performance might decrease if we keep doing this, because the current model instance influences the new data that comes in, that is used to train future model instance: The percentage of the customer who will not be shown ads randomly will decrease and there will be more and more data of customer that were shown an ad only if the model believes they will buy what is on it.
Are we right to worry that our data will become biased in the future?
The [unofficial Google data sicence blog], section "Using randomization in training", might contain useful information about this problem, but unfortunately it is too technical for me to make much sense of all of it. Other than the message "Yes, worry about this: but keep sending out a small part of ads randomly and you will be fine", which I got, what else could I do to reduce this problem?