Prediction in the training sample with randomforest in r


I'm using a Random Forest algorithm in order to construct a classification model, and I HAVE to check the accuracy of my rf model in the training sample, but as you can see in this answers :

you can't evaluate the accuracy considering the training samples like this:

predict(model, data=train)

I'm not confortable with the idea of use OOB to get accuracy of the training sample, because the OOB was not used to build the model, how could this be right? I don't know what should I do to predict the fit in the training sample and get the accuracy of the training sample, is it possible or make any sense? When a check the AUC of the prediction of my training sample I get something near of 0.98, but the AUC of the test sample is about 0.7. Is this due to the limitations of prediction at the training sample or due to Overfitting?

Michael Elma

Posted 2016-01-09T14:43:38.303

Reputation: 83

1OOB can be used to get an idea of the test error. OOB measures the performance of the model by taking a data point from the training set, and making a prediction using only the trees which does not contain that specific data point. If accuracy is still what you need you can calculate it from the confusion matrix – Harpal – 2016-07-21T17:07:58.177

What is your sensitivity and specificity of your random forest results? Have you a low number of data points to train your model? It sounds like you are overfitting your training model, so look at adjusting the cutoff value inside your randomforest model, this is available in the documentation. If you are unsure about how to access sensitivity and specificity results, use a confusion matrix ( ) found here.

– jnic – 2016-10-26T11:15:34.603

It's not clear what your question actually is. Are you trying to evaluate models during training (i.e. cross-validation) or are you trying to evaluate a single model? – cdeterman – 2016-01-27T13:46:48.177

No answers