Tag: data-leakage

6 How to deal with possible data leakage in time series data? 2019-02-14T12:14:02.570

6 Does label encoding an entire dataset cause data leakage? 2020-07-22T18:50:12.367

4 Does using user-specific accumulative variables causes data leakage? 2020-01-31T11:05:24.307

3 Is normalizing the validation set of time series a kind of look ahead bias? 2019-03-09T21:31:54.157

3 Manual feature engineering based on the output 2019-03-19T14:08:18.553

2 classification feature selection 2017-03-06T22:14:11.877

2 information leakage when using empirical Bayesian to generate a predictor 2017-04-23T21:50:13.603

2 Can preprocessing the whole population cause data leakage? 2018-10-06T12:50:13.753

2 Is using samples from the same person in both trainset and testset considers being a data leakage? 2020-07-19T22:06:32.793

2 How to split up my dataset in a train and testset, in order to prevent data leakage? 2020-12-23T23:02:55.390

1 What is the difference between data leakage and endogeneity? 2017-11-15T18:01:46.697

1 Dropping less frequently used categorical data? 2018-06-08T18:59:08.463

1 Data leakage and predictive models: should we use past predictions as a feature? 2019-01-04T00:42:42.593

1 Will historical data lead to target leakage? 2019-09-23T08:58:16.033

1 How to keep the test data from leaking into the training process of a machine learning algorithm? 2020-01-23T13:18:39.190

1 Is data leakage in time series due to both I's of the IID principe or only one? 2020-03-26T13:05:02.903

1 Data leakage in bidirectional LSTM timeseries data 2020-03-30T06:40:47.097

1 Identifying possible data leakage 2020-04-29T19:03:15.057

1 Normalizing dependent feature by one of the independent ones 2020-05-26T20:41:32.663

1 Mean encoding in times series 2020-06-17T18:16:41.417

1 Can I apply feature selection before splitting by requiring selection occurs > 90% of time 2020-06-20T18:50:22.057

1 Will setting up time series data in this way cause data leakage? 2020-07-14T12:49:10.807

1 Is it right to maintain the train distribution in test set for unbalanced data? 2020-12-09T06:42:43.273

1 Data\Feature Leakage - feature too close to target? 2021-01-25T08:46:51.263

1 Does binning a time series with pd.qcut (using quantiles) create data leakage? 2021-01-31T09:55:08.040

0 Is it safe to use labels created from unsupervised model to train a supervised model using the same data? 2019-09-17T15:21:09.150

0 Frequency/Count encoding 2019-11-07T13:45:48.827

0 Why do I have leakage while using Stratified Group K Fold? 2019-11-21T11:06:00.977

0 Handle OneHot Encoder in a pipeline with unseen data 2020-04-30T10:53:01.220

0 Need help understanding data leakage 2020-06-03T04:28:57.437

0 Train test leakage doubt for Time series 2020-10-07T08:45:08.430

0 Data leakage when setting class_weight to tackle imbalanced time series data? 2020-12-02T08:38:50.133

0 K-Fold cross validation and data leakage 2020-12-23T23:30:28.503