What does a predicted probability really mean, without considering the accuracy of the underlying model?

2

1

Say I've built a (completely unrealistic) classification model in Keras that gives me 1.00 accuracy.

And next, I would like to use my model on some new, unseen data, and use model.predict_proba to get a probability that the observation belongs to class "A". Say this returns to me a 0.75.

Am I interpreting this correctly in English: "100 percent of the time, the model is confident that this new observation is 75 percent likely to be class A" ?

If this is correct, then let's consider if my model was not totally perfect, like in real life, and instead it gave me a 0.40 accuracy. Say my predict_proba is still 0.75. Then, is this correct:

"40 percent of the time, the model is confident that this new observation is 75 percent likely to be class A." ?

If so...this makes it seem like predict_proba() is not tell a complete story.

I could mislead someone (say a journalist...or a judge, whoever) by saying, "There's a 75 percent chance this unseen observation belongs to class A"...and that might sound great, if I fail to reveal that this statment was based on a model that had a low accuracy like 0.40.

Am I stating this correctly, and does my apprehension have validity?

Monica Heddneck

Posted 2017-08-08T08:06:09.933

Reputation: 605

Answers

3

Accuracy is measured in classification model by comparing the predicted labels to the actual known labels.
The predicted labels are a function of both the predicted probabilities for each class and a predefined threshold(binary classification usually is 0.5)
So if sample A got predict_proba of {0: 0.2, 1: 0.8} it will be labeled as 1(since 0.8 > 0.5).
Accuracy is measure of classification correctness and predict_proba is a direct result of the model underlining function.

yoav_aaa

Posted 2017-08-08T08:06:09.933

Reputation: 878

makes sense.... – Monica Heddneck – 2017-08-08T09:47:24.290