Accuracy after selftraining didn't change


I used Decisiton Tree Classifier which I trained with 50 000 samples. I have also set with unlabeled samples, so I decided to use self training algorithm. Unlabeled set has 10 000 samples. I would like to ask if it is normal, that after retrainig model with these 10 000 unlabeled samples, accuracy didn't chaned as well as confusion matrix has same values? I expected some changes (better or worse prediction). Thank you in advance.


Posted 2019-03-23T13:10:47.660

Reputation: 21



Well, that is a bit of a turn down but: your model has limitations.

If the 50.000 data forms a complete set for your problem that means that more data won't be needed or helpful.

What do I mean by complete set is: there are enough samples to form a full rank correlation matrix in your feature space. So from your samples you can get a set that can generate all other samples in your feature space by linear combination.

Also, while your data might represent everything a decision three needs to know for classificating your data in the generated feature space, there may be other feature spaces that benefit from the extra data (such as deeper trees or other models)

You might try helping you decision tree by providing a few normalizations for data and feature engineering

Pedro Henrique Monforte

Posted 2019-03-23T13:10:47.660

Reputation: 1 366