“It is said that the classic linear regression assumes there is no correlation between its independent variables”

Depending on your goal of doing the regression, this common statement is false.

Even with multicollinearity, you get that $\hat{\beta}=(X^TX)^{-1}X^Ty$ is the solution to the OLS optimization.

Even with multicollinearity, you get that $\hat{\beta}=(X^TX)^{-1}X^Ty$ is minimum variance linear unbiased estimator from the Gauss-Markov theorem.

What the Gauss-Markov theorem requires is that the error terms not be correlated. This is commonly confused for saying that the predictors are not correlated, but that is indeed a mistake.

There can be numerical instability when you do math on a computer, particularly as you approach perfect multicollinearity ($X^TX$ is close to singular, singular in the extreme case of perfect multicollinearity or correlation of $1$ between variables), but there is no inherent problem with multicollinearity if your goal is to predict.

Where multicollinearity can hurt is when you want to do inference on the parameters, which is rarely a goal in machine learning. When you have multicollinearity, the parameter standard errors are inflated, sapping away your power to tell that they are not zero. Philosophically, it also makes it hard to attribute an effect to a particular predictor if it is correlated with others. (Imagine a hospital wanting to know if it pays its neurosurgeons as much as its heart surgeons and sees that the heart surgeons make way more but also sees that the heart surgeons have much more experience. Do they make more because of their specialty or because of their experience?)

Multicollinearity also can mean that you might be able to use a smaller amount of variables to get almost as much information as the entire set of variables. For instance, if two predictors are highly correlated, it might not be worth including both; you might be better off leaving out one for the sake of model parsimony and having fewer parameters in your regression, but that is an empirical problem and up to the model-designer’s judgment.

Getting to the full GLM framework, the Gauss-Markov theorem does not apply, but the idea remains that there is no inherent issue with multicollinearity when your goal is to predict instead of to do parameter inference, which is the typical goal in machine learning.

Better to use a GLM than to use what? Guessing the outcome? Modeling with a neural network? Support vector regression? – Dave – 2020-06-20T15:03:16.857