This is the starter challenge, Titanic. The original question I posted on Kaggle is here. However, nobody really gives any insightful advice so I am turning to the powerful Stackoverflow community.
Based on this Notebook, we can download the ground truth for this challenge and get a perfect score.
I tested it and it does give me 100% on LB for the purpose of confirming it is the ground truth as it claims. (side question here: how do I remove this perfect submission because now it shows I have 100% on this challenge but I want to show my real score, which is roughly 80% and I will keep improving)
Sometimes submission on Kaggle takes several minutes to get back the score so I used the ground truth locally to test my different models to save time. However, they always give me different results. See the following:
These are the code I use, what's wrong? You can use my code to try your submission and do you also have the same problem?
def mark(pred): solution = os.path.join(dirname, './output/solution.csv') submission = os.path.join(dirname, './output/'+pred) solution = pd.read_csv(solution) submission = pd.read_csv(submission) solution.columns = ['PassengerId', 'Sol'] submission.columns = ['PassengerId', 'Pred'] df = pd.concat([solution[['Sol']], submission[['Pred']]], axis=1) num_row = df.shape print(pred[:-4], '==', (df[(df['Sol'] == df['Pred'])]).shape / num_row) if __name__== "__main__": mark('achieve_99_dtree_rfe.csv') mark('advanced_feature_with_stacking_5_fold.csv')