What is the difference between public test and private test on Kaggle


I am trying to use the datasets from a competition held on Kaggle in which the dataset contains fer2013.csv file having columns emotion, pixels, and usage where usage contains 3 values i.e "training", "privatetest" and "publictest". So I am not getting what is publictest and privatetest. It would be appreciated someone could explain to me what it is?


Posted 2018-12-21T09:59:43.823

Reputation: 159



"publictest" data is used to calculate the score for the public leaderboard while competition is running.

After the competition is finished, competitors will be ranked on the data which was marked as "privatedataset".

Antonio Jurić

Posted 2018-12-21T09:59:43.823

Reputation: 939


The privatetest data is needed because competitors can in principle overfit the publictest data.

On Kaggle you are allowed to submit your scores multiple times and see your performance. By this constant feedback one could, on purpuse or subconsciously, overfit.

It is a good practice to assess model's performance on completely new and unseen data.


Posted 2018-12-21T09:59:43.823

Reputation: 161