Generally model gets biased towards data_samples/target whose frequency is high in training data set. Is it possible during training that model gets biased towards low frequency training data set.

Could you please elaborate your question and problem? – Shubham Panchal – 2019-05-24T04:46:03.657

We have a dataset of binary classifier. Where class 1 data is huge whereas class 0 is having very less data, i.e. data is skewed. During model training, its quite possible that model should be biased towards class 1 and its expected. I want to know is it also possible if model get biased toward class 0? – vipin bansal – 2019-05-24T04:48:01.587



With structured data, you have in general 4 challenges:

(1) Missing data

(2) Outliers

(3) Cardinality

(4) Rare values (as a rule of thumb <5%)

Rare values in categorical variables tend to cause over-fitting, particularly in tree based methods. Ph.D. Data Scientist Soledad Galli has an amazing course on the subject (Udemy: "Feature Engineering". Below a screenshot from her course, but to be fair to her, I'm not going to post the solution.

1Thanks for the reference. – vipin bansal – 2019-06-05T12:43:36.933