Why don't you consider Gradient Boosting Decision Trees (GBDT) for Regression which you will find many Python implementation for (XGboost, LightGBM and CatBoost).
The good things about GBDTs (more relevant to your problem) are:
- They have an intrinsic way to calculate feature importance (due to the way trees splits work .e.g Gini score and so on).
- They can deal with categorical variables that you have (sex, smoke, region)
- Also account for any possible correlations among your variables. Simple linear models fail to capture any correlations which could lead to overfitting.
- There are many ways to regularize GBDTs, which may come very handy!
With GBDTs you only have to be careful with continuous variable, in your case the bmi variable, not to be artificially overruling your trees (trees have hard time dealing very continuous data). You can easily overcome this challenge by rounding up/down or binning your continuous variable or other methods.
If you have strong reasons to stick to linear regressions, maybe you could use LASSO which is a regularized linear regression that harshly penalizes (=0) the less important variables. People actually use LASSO for feature selection as well.