Shifting training data


I want to create a neural network and train it on some data, however I want to be able to create a new model without retraining it from the start.

An example, I have 1000 data points in my training data

  1. model - trained on 0-99
  2. model - trained on 1-100
  3. model - trained on 2-101
  4. and so forth

So I'm wondering if I can use the first model to train the second model, essentially forgetting the first data point.

You can view it as a sliding window over the 1000 data points, sliding one data point to the right for each new model.

Does it make sense? Is there any easy way to solve this problem?


Posted 2018-04-10T21:30:08.717

Reputation: 31

2As usually the main goal of training an NN is to generalise a function from example data, could you clarify what you mean by "forgetting the first data point"? Are you wanting to purposefully overfit so that predictions against the first data point are no better than random? If not, do you have any criteria for when the training for any specific model e.g. "model 2 trained on 1-100" is complete, or whether its results are acceptable to you? It is clearly possible to take model 1 and re-train it, but what is missing in the question is the goal for doing so . . . and that affects the answer here – Neil Slater – 2018-04-10T22:00:12.823

The idea is that the first data point is not relevant any more. I dont want overfit, so imagine continuous addition of data points. For a given situation, only the 100 most recent data points are relevant. When a data point is added, the most recent model is trained on a data point that is (in theory) not relevant any more. – norflow – 2018-04-10T22:15:39.107

Adding to my comment, essentially Im asking if it is possible for a model to only consider the most recent data points, without retraining the model from the bottom on the most recent data - only partially traning it in some way, making it "forget" the no longer relevant data form the previous training set. – norflow – 2018-04-10T23:16:35.173

If you're afraid of overfitting, there are much better ways, like dropout, l2 regularisation and just stochastic batching. – Andreas Storvik Strauman – 2018-04-11T07:03:53.463

Actually, your method looks a lot like cross validation (not the SE site, but the actual algorithm) that excludes some of the data for each iteration.

– Andreas Storvik Strauman – 2018-04-11T07:06:04.390

@norflow: Maybe you just want online training, but this is still not clear from your question or comments. How are you testing your network? How will its predictions be used? For instance, how will you know training for items 0-99 is complete and useful for its task? After you have trained it on items 0-99, what is the task you will use it for - predicting item 100? – Neil Slater – 2018-04-11T07:23:31.513

Thank you so much for all of your time. I should clarify by describing the problem I want to solve instead of wasting your time, sorry. I want to predict some future event. My hypothesis is that only the last 6 hours are relevant,so when time passes, the model will be trained with "old" data. The idea is that there will be some pattern present in the data that will help predict the next event. So I need an adaptive model that "forgets" the old (non-relevant) data without retraining it from the bottom every time a time unit passes, because of real-time (every 5th min) – norflow – 2018-04-11T08:08:29.913

@norflow: It is still not clear from your comment. Are you saying items 0-99 are being used together to predict something about item 100? As you describe the training data, I am expecting each item (e.g. item 25) to be a separate valid input to the network, with a fixed training target. So input data about item 25 into the network, what is the output? Is the output associated only with item 25, and the rules for association changing over time? Or would it be more accurate to say the the output depends on items 0-24 as well, so this is more about predicting outcome of a sequence? – Neil Slater – 2018-04-11T08:21:37.697

I think some simplified examples will help. Please make it clear how older items become less relevant. In a practical sense, there is a difference between a non-stationary problem (where the rules of induction keep changing) or a sequence prediction problem (where the history of the data has a predictable consequence and is part of those rules). – Neil Slater – 2018-04-11T08:23:54.437

No answers