Skewed distributions in predictive models



What are the issues of dealing with highly skewed variable in a supervised problem? What are the machine learning algorithms that suffer more from skewness in the data and what are the solutions to this problem?

David Masip

Posted 2018-05-01T07:00:47.673

Reputation: 5 101

Question was closed 2020-10-03T21:04:43.410


Are you asking about dependent or independent variables? At this state it is way too general and you can already find possbile duplicates like:, or If still not clear happy none of these are your questions please re-frame your question and will be more than happy to discuss further and share my inputs. Often it is largely discussed for classification problems for the target variable!!

– TwinPenguins – 2018-05-01T11:06:29.750

No, the referred question talks about skewness on classes. I talk about skewness of a continuous distribution. – David Masip – 2018-05-01T12:25:48.220

I see and I kind of guessed you are talking about the "continuous distribution", but your questions still is not clear whether it is the independent or dependent variable? If it is the dependent variable (target) almost the same principle as in classes applied and it is strongly recommended to transform it to a normally distributed esp. in linear reg. models where the assumption of residuals to be normally distributed has to hold and more. If it is about the independent variables we could very differently. – TwinPenguins – 2018-05-01T14:05:22.613

You are right, I was thinking about both cases – David Masip – 2018-05-01T15:57:54.663

No answers