I was wondering if anyone has any advice on where to start digging for this problem. I have a model which has gone through development and all train/cv/test data sets now perform above 95% both for accuracy and F-Score. The total development data set is around 60k samples, with a 2/3 split for positive and negative samples. These samples are based on extracts for the months of Jan to Nov of last year. Final test results were:
Precision: 0.9751 Recall: 0.9320 Accuracy 0.9693 F score 0.9531
However, the first runs in production showed a very high precision:95%+ but a very low recall:~50%. Accuracy = 48%, FScore = 68%.
Any thoughts from the group on this, where to look, potential causes. We will run this over the next couple of months, as we may have exceptional variations due to the Xmas period, but we were surprised. Any help appreciated. Thanks