3

1

I received a dataset for analysis that had ~100 numeric columns with anonymous column names($X1$, $X2$, $X3$, etc...) and asked to do a binary classification. My resulting classification algorithm using a SVM had good accuracy (> 95%), but I was unable to do much in the way of feature engineering or feature generation other than the standard scaling, null-value replacement, etc, since I had no intuition about the columns.

Is there any standard logic as to how to do some sort of automated feature generation, i.e. some simple mathematical combinations of various columns to create new, useful features? Does this sort of thing have any mathematical basis for linear or tree-based models? Or is feature engineering only really meaningful when one has intuition based on the column names...

You can try analyzing the feature importance of the cols and then try to use statistics and decipher the meaning of the underlying cols – Aditya – 2019-07-30T05:35:12.227

1

Maybe relevant to your question http://www.orges-leka.de/automatic_feature_engineering.html The method is based on Bourgain Embedding.

– None – 2019-08-31T15:42:06.49795% accuracy sounds good, but what's your class size distribution? – Itamar Mushkin – 2020-06-30T14:23:18.570