Overfitting results with Random Forest Regression

2

I have one image that contains for each pixel 4 different values. I have used RF in order to see if I can predict the 4th value based on the other 3 values of each pixel. for that I have used python and scikit learn. first I have fit the model, and after validate it I used it to predict this image. I was very happy and scared to see that I got very high accuracy for my model : 99.95%! but then when I saw the resulted image it absolutly wasn't 99.95% of accuracy:

original image:

enter image description here

result image:

enter image description here

(I have makrd the biggest and most visible difference).

My question is- why would I get this high accuracy when the visualization shows very well that there is much less accuracy? I understand it might come from overfitting but then how this different is not detected?

edit: Mean Absolute Error: 0.048246606512422616 Mean Squared Error: 0.00670919112477127 Root Mean Squared Error: 0.0819096522076078 Accuracy: 99.95175339348758

Reut

Posted 2020-06-21T13:10:01.493

Reputation: 253

MAE/MSE/RMSE are metrics for regression, while Accuracy is for classification. What are you measuring with Accuracy? – Carlos Mougan – 2020-06-21T13:52:06.207

I would like to get some measurment of how good was the classification – Reut – 2020-06-21T14:14:46.393

But you are running a regression model, the post is called random forest regressor. Accuracy is not a great metric to evaluate this. What is your train target like? – Carlos Mougan – 2020-06-21T15:52:55.263

Answers

1

Where are you evaluating the performance of your algorithm?

Are you making a train test split and evaluating in the test split? It might be that you overfitted your train and you are just measuring the accuracy there.

If you have made correctly the train/test split and the evaluation it could be that the images that you are predicting do not have the same properties/configuration/topology than the with you are trainning

Carlos Mougan

Posted 2020-06-21T13:10:01.493

Reputation: 4 420

I do use the test split and then I fit the model and predict it. I do all those calculations in pandas table, so each pixel is a row, and there not suppose to be influence of topology or coordinates – Reut – 2020-06-21T13:32:05.413

whats your accuracy in test? can you provide this output in train and test? https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

– Carlos Mougan – 2020-06-21T13:39:57.290

By eye I would guess taht the accuracy is more or less 90% in test – Carlos Mougan – 2020-06-21T13:41:10.430

@I have added this data in the original post right now :) – Reut – 2020-06-21T13:48:02.270