2

1

I am currently working with the forest cover type prediction from Kaggle, using classification models with scikit-learn. My main purpose is learning about the different models, so I don't pretend to discuss about which one is better.

When working with logistic regression, I wonder if I need the 'penalty' parameter (where I can choose L1 or L2 regularization). Based on what I found, these regularization terms are useful to avoid over-fitting, specially when the parameter values are extreme (by extreme I understand the range of some parameter values are very large compared to other parameters, Correct me if I am wrong. In this case, wouldn't it be enough to apply log-scale or normalization to these values?).

The main questions are: as the number of parameters is large, are there visualization techniques and tools in scikit-learn which can help me to find parameters with extreme values? is there any statistical function/tool which returns how extreme the values of parameters are?