Tag: preprocessing

41 How to prepare/augment images for neural network? 2015-02-24T11:59:36.033

32 StandardScaler before and after splitting data 2018-09-18T02:35:36.337

28 Difference between OrdinalEncoder and LabelEncoder 2018-10-07T18:55:40.833

17 Different Test Set and Training Set Distribution 2018-02-26T20:29:22.630

16 Image resizing and padding for CNN 2018-04-25T13:46:47.773

12 Loading own train data and labels in dataloader using pytorch? 2019-02-20T21:13:45.157

11 One Hot Encoding vs Word Embeding - When to choose one or another? 2018-04-03T14:13:28.643

10 Please review my sketch of the Machine Learning process 2020-04-06T01:10:56.257

9 How to approach the numer.ai competition with anonymous scaled numerical predictors? 2016-06-29T16:11:34.107

9 Data preprocessing: Should we normalise images pixel-wise? 2018-01-21T11:56:56.290

8 Preprocessing for Text Classification in Transformer Models (BERT variants) 2019-11-08T06:28:48.750

7 How to preprocess different kinds of data (continuous, discrete, categorical) before Decision Tree learning 2015-08-07T10:43:50.747

7 How to implement global contrast normalization in python? 2016-11-14T17:35:28.093

7 Extracting individual emails from an email thread 2017-06-01T13:02:23.683

7 sklearn SimpleImputer too slow for categorical data represented as string values 2020-01-07T12:43:11.443

6 How to define a distance measure between two IP addresses? 2015-11-09T09:40:19.857

6 Dealing with training set of questionable quality 2015-11-16T10:57:53.013

6 issue with oneHotEncoding 2017-10-18T19:40:56.623

6 Why is input preprocessing in VGG16 in Keras not 1/255.0 2018-02-24T04:14:29.187

6 Best way to scale across different datasets 2019-05-03T09:57:29.203

6 How distribution of data effects model performance? 2020-05-25T07:16:58.357

5 What is a benchmark model? 2015-11-10T18:16:17.677

5 Do you apply outlier detection of numerical data in practical applications? 2016-07-04T15:19:21.973

5 Nested features with one to many relationships 2017-05-24T19:46:03.913

5 Should I standardize first or generate polynomials first? 2017-07-18T15:45:06.017

5 How to get spike values from a value sequence? 2018-01-25T11:09:14.883

5 Columntransformer multiple columns with vector inputs 2018-11-28T01:14:10.883

4 How to choose best classifier for Low positive to negative class ratio in data (training, validation and real time)? 2016-02-26T13:54:54.900

4 How would one separate digits for number recognition? 2016-06-15T15:44:45.850

4 Normalising data with multiple methods 2018-05-26T07:25:58.637

4 Feature importance over a subset of instance space instead of an entire instance space 2018-07-16T08:50:08.250

4 How to perform data scaling/standardization on dataset containing grouped values? 2020-05-21T12:10:20.760

4 Reducing the size of a dataset 2020-08-27T01:24:51.847

4 If there are no missing values in our training set, should we accommodate missing values in an unseen test set? 2020-09-09T06:45:06.503

3 Denoising Autoenoders with Variable Length Input 2015-08-21T14:26:10.297

3 Sampling for multi categorical variable 2015-10-11T20:58:19.907

3 How to visualize data of a multidimensional dataset (TIMIT) 2015-10-24T13:24:09.340

3 Is it common to preprocess image data before sending it through a deep net? 2015-12-17T01:42:57.980

3 Redundancy - is it a big problem? 2016-06-05T10:02:27.790

3 Machine Learning or Survival Analysis? 2016-07-20T21:08:35.813

3 Convert exponential to normal distribution 2017-05-12T22:40:59.083

3 What pre processing should I use on data to feed into a CNN? 2017-08-06T07:52:00.267

3 One hot encoding of target space 2018-01-12T19:04:18.553

3 Should I Impute target values? 2018-01-12T21:08:05.720

3 Video classification of birds 2018-04-14T18:15:57.667

3 How to scale prediction back after preprocessing 2018-05-22T08:04:03.037

3 Standard correlation coefficient of various datasets 2018-06-04T09:42:36.590

3 Normalizing test data 2018-07-01T11:53:42.153

3 A single column has many values per row, separated by a comma. How to create an individual column for each of these? 2018-09-28T14:41:38.027

3 Why does not log transformation make the data normalized? 2019-03-06T08:39:19.887

3 Using pandas get_dummies() on real world unseen data 2019-03-12T09:33:14.923

3 Preprocess image data to classify objects based on shape 2019-04-04T20:09:55.200

3 SVM SMOTE fit_resample() function runs forever with no result 2019-04-05T19:20:49.943

3 One hot encoding as input to recurrent neural networks 2019-04-24T09:04:45.617

3 Large no of categorical variables with large no of categories 2019-06-04T11:23:27.523

3 Difference between normalization and zero centering 2019-06-20T18:22:06.323

3 How to export PCA to use in another program 2019-07-04T15:30:20.730

3 binning high cardinality categorical features 2019-08-11T20:45:43.817

3 how to handle values that only appear once in a column? 2019-08-23T18:05:13.823

3 ASR on low dataset 2019-12-27T05:49:28.817

3 For outliers treatment, clipping, winsorizing or removing? 2020-01-03T16:09:34.620

3 How can I perform categorical encoding when the dataset is too large for memory? 2020-01-06T13:57:21.197

3 preprocessing time sequence 2020-02-27T11:07:49.120

3 Iterate over multiple dataframe rows at the same time 2020-03-27T14:50:14.013

3 Why is oversampling outperforming class weight? 2020-04-19T04:05:16.523

3 How to use sklean pipeline to deal with data that read in line by line 2020-05-06T11:48:45.840

3 Is it acceptable not to transform() test data after train data is being fit_transform()-ed 2020-05-31T15:19:44.783

3 What is the proper order of normalization steps before and after splitting data 2020-07-04T19:43:30.850

3 Handling features with multiple values per instance in Python for Machine Learning model 2020-07-22T12:33:32.573

3 When to One-Hot encode categorical data when following Crisp-DM 2020-07-31T06:00:08.637

3 Smart sentence segmentation not splitting on abbreviations 2020-10-13T06:29:48.637

3 Effect of Stop-Word Removal on Transformers for Text Classification 2020-12-03T20:24:23.693

3 How data are prepared during training, testing and in production? 2020-12-16T15:08:15.560

3 Preprocessing: StandardScaler() Do we really need mean to be zero? 2021-02-04T14:10:16.370

2 Preprocessing in Data mining? 2015-08-26T17:37:49.187

2 How should clickstream data be prepared before user segmentation can be performed? 2015-10-04T14:57:05.073

2 Does it makes sense to apply feature scaling on timestamp 2015-11-05T15:02:35.457

2 How can I preprocess multi-page image inputs in a theano/lasagne network? 2015-12-11T16:59:33.750

2 User activity representation for Prediction/ML 2016-02-07T22:27:42.207

2 What metrics must i use in my data(unstructured) preprocessing research? 2016-02-20T10:11:17.397

2 What are some method for pre-processing data in OCR? 2016-09-28T05:57:38.303

2 Real time noise removal using Savitzky-Golay Method 2017-02-13T05:12:42.300

2 How to use a dataset where attribute names are changed? 2017-06-11T02:39:27.017

2 how is countvectorizer used in real production environment? 2017-07-12T01:37:41.657

2 How to preprocess Acoustic Data 2017-08-31T07:59:02.007

2 Keras loading images in incorrect format 2017-09-13T16:08:13.013

2 How Box cox and other transformations convert data into Normal Distributions? 2017-10-25T12:23:52.413

2 Dealing with a dataset where a subset of points live in a higher dimensional space 2017-12-04T06:30:36.457

2 Preprocess list data 2018-02-06T19:58:15.913

2 How to check and correct misspelling in the data of pairs of words? 2018-04-01T08:51:55.923

2 How to load a csv file into [Pandas] dataframe if computer runs out of RAM? 2018-04-16T01:21:22.073

2 How can I measure the similarity between 2 IP addresses? Is there any code to re-use? 2018-05-01T18:27:02.020

2 normalization/denormalization for linear regression problem 2018-05-15T08:22:36.473

2 When should ordinal data be represented catigorically and when as integer? 2018-08-18T16:35:07.987

2 Can preprocessing the whole population cause data leakage? 2018-10-06T12:50:13.753

2 Is there any tool for data visualization and manipulation? 2018-10-11T22:25:02.427

2 Transformation of categorical variables (binary vs numerical) 2018-11-04T17:15:20.940

2 Image normalisation methods 2019-01-09T19:38:07.967

2 Pre-processing on MRI images 2019-01-19T22:14:14.573