What feature engineering is necessary with tree based algorithms?



I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your categorical values properly.

In the past I've done plenty of linear regression analysis. So feature engineering mainly concerned with:

  • Getting features into the correct scale using log, exponent, power transformations
  • Multiplying features: if you have height and width, multiply to make area
  • Selecting features: remove features based on P value

But, for LightGBM (and Random Forest) it seems like the scale of the features doesn't matter because orderable items are ordered and then randomly bisected. Interactions of features don't matter because one of the weak classifiers should find it if it is important. And feature selection isn't important because if the effect is weak then those classifiers will be attenuated.

So, assuming you can't find more data to bring in, what feature engineering should be done with decision tree models?

William Entriken

Posted 2017-08-08T15:00:47.583

Reputation: 333

I'm new here, please provide advice rather than just downvoting. – William Entriken – 2017-08-12T15:04:45.263



Feature engineering that I would consider essential for even tree based algorithms are:

  • Modular arithmetic calculations: e.g. converting a timestamp into day of the week, or time of day. If your model needs to know that something happens on the third Monday of every month, it will be nearly impossible to determine this from timestamps.

  • On a similar vein, creating new features from the data you have available can drastically improve your predictive power. This is where domain knowledge is extremely important - if you know of, or think you know of a relationship then you can include variables that describe that relationship. This is because tree based methods can only create splits that are horizontal or vertical (i.e. orthogonal to your data).

  • Dimension reduction is typically performed by either feature selection or feature transformation. Reducing the dimension through feature selection will likely not help much with the models you mention, but an algorithm may or may not benefit from feature transformation (for example principal component analysis) depending on how much information is lost in the process. The only way to know for sure is to explore whether feature transformation provides better performance.


Posted 2017-08-08T15:00:47.583

Reputation: 401

Thank you! This is making me think differently about my data. – William Entriken – 2017-08-11T21:23:11.900

2One approach I am trying is to look at lat and long features. To transform this I will select 10 points randomly in the space and calculate distances from these (lat, lot) to each point. These will be 10 new features that will hopefully be more useful. – William Entriken – 2017-08-12T15:08:00.633