Tag: feature-engineering

145 When to use One Hot Encoding vs LabelEncoder vs DictVectorizor? 2015-12-19T19:30:35.527

38 Encoding features like month and hour as categorial or numeric? 2017-03-22T07:43:57.223

26 Ways to deal with longitude/latitude feature 2016-08-20T06:51:26.563

25 Encoding categorical variables using likelihood estimation 2016-04-04T09:31:14.030

25 Should one hot vectors be scaled with numerical attributes 2018-05-14T17:54:58.557

24 Why do we convert skewed data into a normal distribution 2017-07-07T11:35:05.640

19 How to perform feature engineering on unknown features? 2016-03-10T19:39:16.190

14 What is difference between one hot encoding and leave one out encoding? 2016-03-23T03:25:53.170

14 List of feature engineering techniques 2016-07-25T18:55:53.813

12 Why does frequency encoding work? 2019-11-25T15:36:36.253

11 Is feature engineering still useful when using XGBoost? 2017-03-20T13:58:09.653

10 Is this a good practice of feature engineering? 2018-06-13T22:07:27.770

9 Adding feature leads to worse results 2017-12-07T06:46:01.720

9 Does "feature importance" depend on the model type? 2020-08-24T14:19:01.947

8 Dissmissing features based on correlation with target variable 2016-03-12T15:21:23.430

8 Using time series data from a sensor for ML 2017-06-01T13:52:10.327

8 Model for Differing Number of Rows per Observation 2019-04-17T16:47:56.343

7 What is the meaning of hand crafted features in computer vision problems? 2017-09-02T00:18:25.147

6 Improve a regression model and feature selection 2015-12-24T17:21:26.850

6 Do Clustering algorithms need feature scaling in the pre-processing stage? 2017-09-03T14:55:47.560

6 Image segmentation - handcrafted features vs DNN? 2018-02-24T03:22:37.157

6 Instead of one-hot encoding a categorical variable, could I profile the data and use the percentile value from it's cumulative density distribution? 2018-04-04T00:31:09.753

6 Why would a fake feature with random numbers get selected in feature importance? 2018-11-14T11:49:16.150

6 Regression vs Random Forest - Combination of features 2019-03-31T14:28:26.237

5 Automatic Feature Engineering 2016-05-24T09:03:28.020

5 Array of categorical variables vs one-hot encoding 2017-05-23T22:33:11.657

5 Why is duplicating inputs bad? 2017-07-21T21:15:44.587

5 What feature engineering is necessary with tree based algorithms? 2017-08-08T15:00:47.583

5 Feature Selection in Linear Regression 2018-04-30T10:19:41.153

5 Should I rescale tfidf features? 2018-06-27T16:30:43.720

5 How to handle large number of features in machine learning? 2018-09-08T06:09:48.977

5 Do I need to engineer lagged features when creating an LSTM for time series forecasting? 2019-04-05T21:29:58.780

5 Categorical vs continuous feature selection/engineering 2019-04-12T10:17:40.903

5 Should features be correlated or uncorrelated for classification? 2019-11-21T17:31:50.663

5 How can we convert time series data to supervised learning problem? 2019-12-02T19:16:35.017

5 What can be done with highly correlated variables (>.95 and <-.95) 2020-02-07T12:56:03.360

4 How to reduce dimensionality of audio data that comes in form of matrices and vectors? 2016-03-14T00:37:25.940

4 How to create vectors from text for address matching using binary classification? 2016-12-20T13:41:54.690

4 What is representation in optical character recognition? 2017-06-06T18:12:53.580

4 Evaluating new features 2018-01-08T16:28:12.343

4 How to use hours of the day as a continuous feature? 2018-01-14T19:51:30.243

4 2D matrix for labelbinarizer 2018-01-28T03:00:32.660

4 Can feature importance change a lot between models? 2018-03-08T18:31:31.410

4 Categorical data for sklearns Isolation Forrest 2018-07-25T14:29:33.800

4 How to use one hot encoding of string categorical features in keras? 2019-01-07T20:11:28.000

4 Feature engineering suggestion required 2019-04-11T01:26:24.393

4 How to automate the encoding process? 2019-06-12T08:57:30.523

4 Feature selection is not that useful? 2019-09-18T14:13:58.033

4 Overfitting due to features correlating with training set generation rules 2019-12-04T12:07:45.953

4 Mean estimation for nested location data 2020-05-19T09:48:20.567

4 How to perform data scaling/standardization on dataset containing grouped values? 2020-05-21T12:10:20.760

4 Should I use keras or sklearn for PCA? 2020-06-19T05:34:08.053

4 KNN Regression: Distance function and/or vector representation for datetime features 2020-08-11T16:11:07.113

4 Machine Learning with intended missing values 2021-01-20T15:56:22.030

3 Handling categorical features in Factorization Machines algorithm - Feature Hashing vs. One-Hot encoding 2015-12-15T10:12:37.867

3 Is it a good idea to train with a feature which value will be fixed in future predictions? 2016-04-24T09:07:15.927

3 numerical or categorical data 2017-02-23T03:51:58.727

3 Effect of Skewness and data range in machine learning 2017-02-23T04:01:44.687

3 Feature extraction of accelerometer data for machine learning 2017-08-03T10:49:30.910

3 Find points on a map close to given points 2017-08-14T03:30:46.653

3 stable set PCA while adding features 2017-09-11T04:26:18.217

3 How to use neural network's hidden layer output for feature engineering? 2017-10-23T10:52:06.303

3 The automatic construction of new features from raw data 2017-10-31T15:26:11.990

3 What are best practices for collaborative feature engineering? 2018-02-20T21:50:46.160

3 Removing Categorial Features in Linear Regression 2018-03-05T16:05:31.800

3 Skewed distributions in predictive models 2018-05-01T07:00:47.673

3 Time series feature extraction from raw sensor data for classification? 2018-05-28T20:38:42.057

3 Too much inputs = overfitting? 2018-06-24T23:09:22.277

3 Time series binary classificaiton with labelling issues 2018-07-03T05:33:05.863

3 Extract features from a survey 2018-07-13T10:06:12.393

3 Using historical label as a feature in my ML model? 2018-08-13T17:40:12.617

3 Metrics to evaluate features' importance in classification problem (with random forest) 2018-08-30T14:21:10.043

3 Creating similarity metric with Doc2Vec and additional features 2018-09-27T20:52:25.110

3 Predicting a cyclic target 2018-10-09T19:20:51.507

3 Handling missing values to optimize polynomial features 2018-10-21T08:47:24.117

3 "help" decision tree by tying 2 features together 2018-10-31T08:31:04.947

3 Layman's explanation of when to use which smoother algorithm/technique: FFT, loess, Savitzky-Golay, etc 2018-11-27T15:42:22.273

3 how to evaluate feature quality for decision tree model 2019-01-03T17:55:45.450

3 Feature engineering from date, mean and standard deviation 2019-01-05T11:14:38.500

3 How to understand features impact in a non linear case? 2019-01-08T21:51:06.690

3 Feature Engineering Lists\Vectors as values in dataframe 2019-02-26T12:12:37.107

3 Why is Reward Engineering considered "bad practice" in RL? 2019-03-10T22:55:04.887

3 Target Encoding: missing value imputation before or after encoding 2019-03-16T10:57:11.730

3 Manual feature engineering based on the output 2019-03-19T14:08:18.553

3 Aggregating target-encoded array-like categorical features? 2019-04-09T18:41:03.810

3 Combining Latitude/Longitude position into single feature 2019-04-18T21:57:04.737

3 How to handle associated features in machine learning 2019-07-10T15:54:23.353

3 How to automate ANOVA in Python 2019-07-14T14:43:53.253

3 Blind feature engineering 2019-07-30T03:15:46.577

3 Why does removal of some features improve the performance of random forests on some occasions? 2019-10-19T11:39:35.093

3 Prediction vs causation in a ML project 2019-12-26T06:32:27.160

3 How can I handle a column with list data? 2020-04-11T08:09:03.127

3 Is there an encoder which can automatically detect the intrinsic order of an ordinal variable and assign values accordingly? 2020-07-15T22:51:05.527

3 Treating missing data in categorical features 2020-08-21T08:35:22.900

3 Dealing with highly variable feature set size 2020-10-21T07:18:16.507

3 How data are prepared during training, testing and in production? 2020-12-16T15:08:15.560

3 How best to use the resale transaction year in predicting housing prices? 2020-12-29T00:34:46.480

2 What is a good explanation of Non Negative Matrix Factorization? 2016-02-18T04:25:38.627

2 Detecting redundancy with Pearson correlation in continuous features 2016-03-12T17:44:12.350