I have two files : Test_data - contains the features of a dataset to find predictions for Submission_data - contains two columns : The index column for test data and another column for its corresponding predicted value
So , I have to make predictions on the test data and store the predicted values in the submission file.
During preprocessing of the test data , I am dropping rows that do not contain values (NaN) for atleast 50% of the features(columns) :
test_data = test_data.dropna(thresh=math.ceil(test_data.shape/2))
Now , How do I remove the corresponding rows in the submissions dataframe ? Because , If I drop some rows in the test data , I cannot make a prediction for the corresponding row in the submissions dataframe/file.
The problem is , there is an Index column that does NOT HAVE UNIQUE values (In both test data and submissions data)
So , How do I drop the rows in Submissions data that were also dropped in Test data ?
I am new to ML challenges and I find this challenging .