## Evaluating new features

4

2

How should I evaluate whether new features are effective or not? Should I build a new model with the new features then compare with the old one with the same hyper parameter?

1You may want to consider adding additional information to your question in order to allow us to help you more effectively. Information that would be relevant would include the type of model, the incremental impact of improvements to the model (i.e. how much does a small incremental improvement matter in real terms?), the consumption rate of the model (i.e. how many times is it consumed over what period?), how often new features are generated, the general problem space of the model (i.e. is it a marketing targeting model, with new demographic features?) and any other illuminating information. – Thomas Cleberg – 2018-01-08T17:05:01.403

2

Actually in these cases I usually plot the data based on features two by two. This approach is so much similar to watching covariance matrix. In this way you should see the new added features whether they have correlation, whether they change linearly with each of previous features. Suppose that you have already one feature for a classification task. Then you may want to add another feature. You have to plot data with each feature as one axis and investigate whether they have linear correlation or not. If they don't have any correlation or their correlation is near to zero, you should add the feature because that may help you, they provide a kind of knowledge that the previous features didn't provide. If a new feature has correlation with the previous feature, it means that adding that does not help you have a new knowledge or perception of that concept. Although there are debates here I prefer not to add correlated features because of computation complexity. That will be so time consuming.

For illustrating more, suppose you have a data-set A = {X1, X2, y} in which X1 and X2 are the feature and y is the label and all are binary values. Also suppose that the covariance matrix between these is as follows.

• We may infer that we can ignore X1 in the process of classification without loss of accuracy, if e is equal to zero which means label y is independent of X1, since they are binary valued. We can ignore X1 in classification, regardless of other values.

• We may infer that we can ignore either one of features in the process of classification without loss of accuracy, if |d| >> 0 which means there is high correlation between X1 and X2 and hence information redundancy. So we may ignore one of them without problem.

• We may infer that we can not ignore neither X1 nor X2 in the process of classification, if d = 0 and none of e or f are zero. Then the feature are independent and both are influential on label. So we may not ignore any of them.

2

You should check if there are any among the features, after adding the new ones. After that, you have to check if each feature is informative taking into account the target variable of the problem. For the latter you can perform a statistical test and check the p-value to determine if there is statistical significance, otherwise you could compute the Mutual Information between each new feature and the target variable.

0

Some algorithms have feature importance calculations integrated in their models. In addition to performing statistical checks using p-values and covariance matrices you can rank features by importance. With xgboost you can rank features like this (Full explanation on MachineLearningMastery):

In addition to checking feature importance linearly, with xgboost and other tree based algorithms you can also check if features can be important when they interact with other features.

With this in mind you can rank only old features first and evaluate if new features are better than some of them and which new features don't provide enough information or even worse add noise.

You can also check other approaches on Interpretable Machine Learning and Stats Stack Exchange