Tag: categorical-data

173 K-Means clustering for mixed numeric and categorical data 2014-05-14T05:58:21.927

145 When to use One Hot Encoding vs LabelEncoder vs DictVectorizor? 2015-12-19T19:30:35.527

20 Why do we need to discard one dummy variable? 2018-02-18T17:43:56.533

18 How to combine categorical and continuous input features for neural network training 2018-03-28T08:49:04.513

14 How can I appropriately handle cleaning of gender data? 2020-03-20T04:23:51.880

12 How to convert categorical data to numerical data in Pyspark 2015-06-29T22:55:28.100

12 How can I dynamically distinguish between categorical data and numerical data? 2016-01-21T20:15:04.757

12 Mass convert categorical columns in Pandas (not one-hot encoding) 2016-09-18T16:45:15.647

12 Feature importance with high-cardinality categorical features for regression (numerical depdendent variable) 2017-04-05T18:23:12.657

12 Catboost Categorical Features Handling Options (CTR settings)? 2018-01-24T15:50:51.917

12 How can I do classification with categorical data which is not fixed? 2018-08-27T13:31:41.083

12 Why does frequency encoding work? 2019-11-25T15:36:36.253

11 Clustering for mixed numeric and nominal discrete data 2015-11-02T04:12:53.367

10 Confusion about Entity Embeddings of Categorical Variables - Working Example! 2018-12-16T22:06:28.993

9 Using NLP to automate the categorization of user description 2014-12-09T20:49:37.093

9 How to combine PCA and MCA on mixed data? 2016-01-19T09:03:57.597

8 One Hot encoding for large number of values 2015-10-03T18:37:16.597

8 Keras categorical_crossentropy loss (and accuracy) 2017-06-22T20:04:18.330

6 Machine Learning - Where is the difference between one-class, binary-class and multinominal-class classification? 2014-10-20T06:38:16.490

6 Classifier and Technique to use for large number of categories 2015-09-26T11:58:37.963

6 How to deal with categorical feature of very high cardinality? 2016-03-03T20:29:22.797

6 Dummy coding a column in R with multiple levels 2016-05-02T11:27:55.597

6 Why after adding categorical data the Linear Regression fails? 2016-10-20T19:32:00.293

6 Why don't tree ensembles require one-hot-encoding? 2017-04-02T03:37:47.290

6 Always drop the first column after performing One Hot Encoding? 2018-02-27T12:28:35.403

6 How to deal with missing data for only some categories 2018-09-19T22:08:20.777

6 How can Time Series Analysis be done with Categorical Variables 2019-06-20T09:06:34.683

6 Mapping of categorical features into binary indicator features 2019-06-24T11:43:35.437

6 How do I encode the categorical columns if there are more than 15 unique values? 2020-12-24T20:11:58.290

5 Relation mining of multivariant categorical timeseries without excluding the temporal nature 2014-11-21T15:23:37.360

5 Quasi-categorical variables - any ideas? 2015-01-28T13:42:15.430

5 Array of categorical variables vs one-hot encoding 2017-05-23T22:33:11.657

5 Nested features with one to many relationships 2017-05-24T19:46:03.913

5 How to continue incremental learning when a categorical variable has been assigned additional category labels? 2018-03-19T07:02:59.750

5 How to handle columns with categorical data and many unique values 2019-04-08T11:04:22.033

5 Feature Selection with one-hot-encoded categorical data 2019-06-01T18:05:36.153

4 Large categorical dataset for regression 2014-11-10T09:45:13.063

4 How to visualise multidimensional categorical data with additional time dimension 2015-11-02T10:29:59.463

4 Categorical and ordinal feature data representation in regression analysis? 2015-12-04T11:36:40.260

4 What's the best way to use binned data in a tree-based model? 2016-02-09T19:10:00.017

4 Scikit Learn Missing Data - Categorical values 2016-07-15T10:43:58.690

4 Best approach for this unsupervised clustering problem with categorical data? 2016-07-20T15:51:05.770

4 Handling categorical variables in linear regression and random forest 2017-05-27T23:48:28.543

4 Categorical Variables - Classification 2017-06-18T17:24:03.913

4 Bayesian combination of multi-dimensional experts? 2017-07-23T05:00:29.247

4 Logic behind SMOTE-NC? 2018-01-07T09:54:03.007

4 Is there an asymmetric version of nominal correlation? 2018-01-09T15:06:46.763

4 Selecting ML algorithm for music composition 2018-05-23T03:46:16.723

4 One hot encoding large dataset 2018-06-10T22:19:02.213

4 Categorical data for sklearns Isolation Forrest 2018-07-25T14:29:33.800

4 Dealing with a dataset with a mix of continuous and categorical variables 2019-02-22T07:53:48.327

4 Test dataset with categorical variable value not present in train dataset & transformer 2019-05-28T04:53:34.053

4 Why Decision Tree Classifier is not working with categorical value? 2019-12-22T18:54:00.187

4 Feature selection for data with both continuous and categorical features? 2020-02-27T10:15:11.063

3 How do I cluster data that is a mix of text & categorical data? 2015-05-18T15:10:05.623

3 Steps in exploratory methods for mild-sized data with mixed categorical and numerical values? 2015-10-24T20:19:43.473

3 Handling categorical features in Factorization Machines algorithm - Feature Hashing vs. One-Hot encoding 2015-12-15T10:12:37.867

3 One-Hot Vector representation vs Label Encoding for Categorical Variables 2016-01-13T13:46:57.730

3 decision trees on mix of categorical and real value parameters 2016-04-19T12:37:05.593

3 How to fix inconsistent (variable spelling) categorical data and "fill in" missing data 2016-05-30T19:31:13.520

3 Missing Categorical Features - no imputation 2016-08-10T14:06:12.223

3 Pandas categorical variables encoding for regression (one-hot encoding vs dummy encoding) 2017-03-20T19:26:11.217

3 Different number of features in train vs test 2017-05-14T14:47:40.710

3 Imputation of missing values and dealing with categorical values 2017-05-23T11:35:44.843

3 Is there a name for a scale which mixes ordinal and nominal? 2017-08-21T16:12:27.800

3 how to decide categorical variables for prediction 2018-05-23T13:29:48.737

3 Data scaling before PCA: how to deal with categorical values? 2018-06-10T08:44:28.937

3 Anomaly detection using clustering of highly correlated Categorical data 2018-07-30T15:18:50.430

3 How to implement feature selection for categorical variables (especially with many categories)? 2018-08-13T01:36:21.833

3 Applying mean encoding before or after splitting into train and test set 2019-05-19T14:37:20.370

3 Large no of categorical variables with large no of categories 2019-06-04T11:23:27.523

3 Purpose of converting continuous data to categorical data 2019-06-21T19:26:20.703

3 binning high cardinality categorical features 2019-08-11T20:45:43.817

3 how to handle values that only appear once in a column? 2019-08-23T18:05:13.823

3 Average of importance gain for a categorical variable 2019-11-13T14:19:16.757

3 Dealing with categorical variables 2019-11-16T09:25:19.840

3 Strategies to encode categorical variables with many categories 2019-12-09T09:24:28.833

3 Why RANDOM noise images always predicted as BIRD? 2019-12-28T10:32:15.533

3 How can I perform categorical encoding when the dataset is too large for memory? 2020-01-06T13:57:21.197

3 PCA and k-means for categorical variables? 2020-02-27T05:02:08.707

3 Is there an encoder which can automatically detect the intrinsic order of an ordinal variable and assign values accordingly? 2020-07-15T22:51:05.527

2 Why is there such a mismatch between the Model's predicted probability and theoretical probability in logistic regression? 2014-06-16T13:30:01.320

2 Choosing the right data mining method to find the effect of each parameter over the target 2014-11-14T19:03:35.603

2 Kmeans on mixed dataset with high level for categ 2015-06-30T05:28:31.537

2 Creating validation data for model comparison 2015-09-23T06:47:48.473

2 Clustering with constraints 2015-10-09T03:44:07.940

2 Tag categorizer 2016-02-11T19:21:28.653

2 Checking Correlation of Categorical variables in SPSS 2016-06-23T22:11:10.443

2 Feature Engineering 2016-08-31T19:30:25.727

2 Using Discretization from Training Set on Test Set in R 2016-09-13T09:09:11.553

2 Do categorical features always need to be encoded? 2016-09-13T13:15:01.727

2 Multidimensional Scaling with Categorical Data 2016-09-30T08:56:03.210

2 Type of Test to Determine Correlation in R 2016-12-29T03:02:14.170

2 How can I handle missing categorical data that has significance? 2017-03-22T21:41:52.413

2 Updating One-Hot Encoding to account for new categories 2017-05-18T00:14:06.673

2 Outlier detection on categorical network log data 2017-07-20T19:19:54.807

2 How to choose the optimal k in k-protoypes? 2017-08-17T16:57:29.223

2 Converting a string to dummy encoded variables 2017-08-23T05:05:06.707

2 What approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)? 2018-01-09T20:25:37.097