Owen Zhang's slides: what does the "time" mean?


Here is one of Owen Zhang's slideshares. At page 12, what does the time mean? enter image description here


Posted 2017-03-12T02:16:29.307

Reputation: 405



Here's my understanding:

I think it's the explanation of page 10, and time is the same as the "time" in page 10, which means the time axis in real life.

In page 10, it shows that:

  • hold out data should be out-of-time, we can not do a random shuffle because it will not represent the real situation our model will be faced.

  • unless data is extremely small

As the slide has said before:

When you looked at your validation result for the Nth time, you are training models on it

In this case: validation data = hold out data = Public LB (if my understanding is correct)

The first figure of page 12 is the normal condition, we shouldn't check our Public LB to tune our model(or we're training model on it). So the answer is no.

The second figure is the small data condition, we have to train model using the feedback from Public LB, because of the data limitaion. So the answer is yes.

The third figure shows that Public Data is the same as Private Data, they both represent something under same time period, sharing the same "reality", thus we can use the feedback from Public LB.

The forth figure shows that Public Data is reality, and Private Data is more "realistic" than Public Data. It's the "hold out data should be out-of-time" case, the answer is yes.


Posted 2017-03-12T02:16:29.307

Reputation: 3 556