Automatic Feature Engineering



I came across a Machine Learning software which I won't name, that claims that completely automates the Feature Engineering process from any source without domain knowledge, implying pretty much that turns Data Scientists unnecessary for this step. Now I find that claim a bit suspicious since it goes against the common sense of what I learned Feature Engineering consists of.

But considering that it might be a lack of knowledge from my part to what point we can automate the featuring engineering process? I can think of some general approaches like statistical methods (median,etc..) or several types of coding (binary, polynomial,etc..) for categorical variables, but could anyone give me an overview state of the art approach regarding this subject or point me in the right direction?


Posted 2016-05-24T09:03:28.020

Reputation: 249

Which base features scales (nominal, ordinal, interval, ratio) does the software claim to be able to take? For which talks does it claim to produce good results (classification, regression, RL)? Does it claim to work for sequence to sequence (then I would be sure it would simply not work)? – Martin Thoma – 2016-05-24T22:30:22.653

It says it connects to sensor, so I assume it would be signals. It's for classification results, I don't think that they claimed that it worked for sequence to sequence. – user697110 – 2016-05-25T12:56:51.153

Ok, I guess you are talking about software which is meant for very specific sensors (e.g. image / sound). Otherwise, this is too general to work currently. And your question is far to broad - I don't want to guess what your actually asking. You have to either give us more specific details, or you will only get this kind of answer. – Martin Thoma – 2016-05-25T13:12:13.420



In my experience, when people claim to have an automated approach to feature engineering, they really mean "feature generation", and what they're actually talking about is that they've built a deep neural network of some sort. To be fair, in a limited sense, this could be a true claim. Properly trained deep neural networks can handle any number of pairwise correlations between individual features or groups of features. That said, without a great deal of up-front data pre-processing tools that know how to intelligently handle different types of input data (e.g., free text, images, etc.), none of this would be possible. Bottom line, it takes a great deal of manual effort to do something automatically.


Posted 2016-05-24T09:03:28.020

Reputation: 1 453

4If I could up vote twice, I'd up vote again for the phrase 'it takes a great deal of manual effort to do something automatically' which I now intend to steal. – Robert de Graaf – 2016-05-25T11:40:10.263


There are several methodologies to do that... For the tool you have been talking about (which I will not mention it's name with respect to you) all sorts of pre-coded functions are run at once, it's basically a vast rule engine.

1) Easiest method is to run mini-trees with a random combination of variables (sounds like random forest)... Each tree that has some significant predictive power on classification is a variable and the nodes are it's categories...

2) You can build auto-encoders (, it's easy to do but hard to interpret, basically what deep learning does is automatic feature engineering, that's why it takes so much time to compute ;)

3) You can do symbolic regression (, possibly using a genetic optimization algorithm to select variables and some mathematical operators to come up with a nice formula that has some power of classification. So when you have some data on a company balance sheet it provides many formulas like EBITDA...

Altan Atabarezz

Posted 2016-05-24T09:03:28.020

Reputation: 41


Well, there is some serious research going on in this direction under the label of "feature learning". But as far as I know, it's not mature enough yet to be packaged into a software tool that renders manual software engineering superfluous.

But there are important successes being achieved in that direction. Modern image recognition with deep neural networks often relies on features that the deeper layers of the neural network compute themselves instead of hand-crafted features developed by humans. So it's not entirely science-fiction. If you want to learn more about the image recognition part of the topic, you may want to watch this beautiful video lecture by Andrew Ng, one of the leading researchers in this field.


Posted 2016-05-24T09:03:28.020

Reputation: 411


Have a look at h2o's driverless AI or, I don't believe either use deep learning, but rather apply an array of mathematical functions on your data set and then performs Feature Synthesis/merging features. The auto-generated feature set is then iteratively tested against your model and discarded or improved.

So yes, while you need domain knowledge/experts to realise that adding or creating new data could make a huge impact, for exploratory purposes you could definitely use auto-features.


Posted 2016-05-24T09:03:28.020

Reputation: 11


Almost anything can be automated but that doesn't mean it makes sense theoretically.

Developing features takes insight and content knowledge from the field that you are studying in order to make sense. This is why even the best data scientists need to work with content experts to make theoretically sound models

Darrin Thomas

Posted 2016-05-24T09:03:28.020

Reputation: 191

Yeah, that's my opinion too, which is why I found that software odd. – user697110 – 2016-05-24T10:58:58.910

It's probably aimed more at analyst that are not trained in ML/statistics but want to build models. Makes it sound more powerful and easier for them to use. – TBSRounder – 2016-05-24T11:14:41.300

SPSS Modeler has a an automatic model tuning option which I've long suspected is used primarily in sales demos - this sounds like a similar principle. – Robert de Graaf – 2016-05-25T11:47:45.633


Maybe relevant for this question:

The method is based on Bourgain Embedding and works whenever one has a distance between two data points. But to have a "good" distance solving the job at hand, one needs domain knowledge as there are many distances around for the same data type.


Posted 2016-05-24T09:03:28.020