Tag: sampling

38 train_test_split() error: Found input variables with inconsistent numbers of samples 2017-07-06T05:17:55.947

35 Intuitive explanation of Noise Contrastive Estimation (NCE) loss? 2016-08-05T03:36:04.553

15 With unbalanced class, do I have to use under sampling on my validation/testing datasets? 2015-11-18T20:14:30.133

15 Is stratified sampling necessary (random forest, Python)? 2017-01-12T00:58:27.320

15 How many features to sample using Random Forests 2017-10-10T10:50:22.720

14 why we need to handle data imbalance? 2017-11-06T06:15:29.570

12 Cross-validation: K-fold vs Repeated random sub-sampling 2014-06-20T17:57:46.363

12 When should we consider a dataset as imbalanced? 2016-05-16T11:36:14.850

9 Cross validation for highly imbalanced data with undersampling 2019-02-04T16:32:21.823

6 How to define a custom resampling methodology 2014-07-10T11:55:49.637

6 Decision trees, categorizacion and oversampling 2014-12-03T14:23:38.830

6 Why gradient boosting uses sampling without replacement? 2020-02-07T06:59:16.777

6 Is sampling a valid way to reduce complexity? 2020-11-08T17:37:02.850

5 Is there a particular order in which to do feature selection and sampling? 2016-08-05T09:10:52.093

5 Keras negative sampling with custom layer 2018-02-22T23:08:01.743

5 Why is sampling useful in machine learning? 2018-07-31T19:22:24.890

5 How are samples selected from training data in Xgboost 2020-01-08T09:32:51.250

4 Parallel active optimization 2016-02-27T15:59:49.043

4 Cross validation plus oversampling? 2017-01-11T03:10:45.100

4 Which is better: Out of Bag (OOB) or Cross-Validation (CV) error estimates? 2017-08-04T10:50:38.383

4 SMOTE and multi class oversampling 2017-11-11T23:20:19.680

4 Exploration vs exploitation tradeoff to find a price that maximizes revenue 2017-12-11T19:45:00.323

4 Oversampling before Cross-Validation, is it a problem? 2019-01-21T12:02:00.250

4 Sub-sampling so that sample statistics match population statistics 2019-02-12T15:09:52.003

4 Why did sampling boost the performance of my model? 2019-09-25T17:00:21.353

3 Avoid iterations while calculating average model accuracy 2014-08-06T09:03:20.857

3 Statistical comparison of 2 small data sets for 2X increase in the population mean 2014-11-26T20:01:23.187

3 Question on reservoir sampling 2015-09-11T18:02:44.743

3 Sampling for multi categorical variable 2015-10-11T20:58:19.907

3 Sklearn StratifiedKFold code explanation 2016-08-01T14:29:32.783

3 Imbalanced dataset: how to deal with test data? 2017-03-26T06:32:14.227

3 What is the allowable limit of oversampling? 2018-02-08T10:49:20.787

3 Overfitted model produces similar AUC on test set, so which model do I go with? 2018-06-27T22:20:08.607

3 How to correctly perform data sampling for train/test split in multi-label dataset? 2018-10-07T16:25:02.430

3 SVM SMOTE fit_resample() function runs forever with no result 2019-04-05T19:20:49.943

3 Is it OK to use the testing sample to compare algorithms? 2019-04-21T16:15:45.997

3 Does Sampling size matters in Multi classification Model 2019-11-27T11:07:59.833

3 Stratified selection based on the y response creates a bias in information (Berkson's bias)? 2020-01-13T15:39:53.597

3 SuperLearner Cross validation with iid time series 2020-02-07T13:54:32.880

3 Over-sampling: is my model over-fitting? 2020-11-30T04:43:00.897

2 How to make an effective sampling from a database of text documents? 2015-03-11T04:18:21.610

2 Package for SMOTEBoost in R 2015-06-01T11:51:45.187

2 Why Markov Chain Monte Carlo allows sampling from a large class of distributions and scales well with the dimensionality of the sample space? 2015-11-05T06:10:40.180

2 Strategy for dealing with giant sample size 2017-03-31T19:47:44.603

2 How to randomly sample crops from plain image with points only if crop contains n points inside? 2017-05-19T15:03:45.007

2 K-Fold Cross validation confusion? 2017-05-23T10:33:11.993

2 Resampling a normally distributed dataset for regression problems? 2017-08-11T14:53:29.367

2 Stratified Sampling Variable Choice 2017-11-08T01:11:35.013

2 Gumbel Softmax vs Vanilla Softmax for GAN training 2018-08-06T18:25:47.753

2 In Machine Learning, what is the point of using stratified sampling in selecting test set data? 2019-02-06T03:26:47.160

2 Highly Imbalanced dataset fro classes more than 200 2019-09-28T06:22:33.520

2 Training data requirements for NLP models 2019-10-10T23:16:44.820

2 Bayesian network in Python: both construction and sampling 2019-11-30T10:14:36.933

2 How to perform bootstrap validation? 2020-01-02T10:23:57.760

2 Gaussian Process for Classification: How to do predictions using MCMC methods 2020-01-08T02:33:28.987

2 Using Majority Class to Predict Minority Class 2020-01-09T16:27:26.310

2 Is there an algorithm for sampling shortest paths? 2020-01-13T06:47:57.380

2 Difference between Gibbs sampling and variational Bayes inference 2020-01-15T05:51:47.860

2 Can sampling like SMOTE/UP/DOWN applied on Validation set? 2020-01-24T22:07:16.113

2 SMOTE for regression 2020-03-03T17:51:15.400

2 Undersampling for credit card fraud detection before or after Train/Test Split 2021-02-09T04:20:01.443

1 Communicating clearly about "samples" 2015-01-05T18:34:55.920

1 Poor performance shown on Rare event modeling 2015-06-08T10:52:39.433

1 What is a "good" sample size 2016-02-29T16:48:33.507

1 Imbalance in observable data 2016-11-08T03:15:40.050

1 Calculate accuracy of crowdsourced responses in realtime 2017-04-08T13:34:35.793

1 Working with audio data with different sample rates in Tensorflow 2018-04-08T18:06:24.233

1 Orange: Group samples by a "splitting" feature for cross-validation? 2018-04-13T10:27:50.960

1 Search Query Sample Size Determination for validation set 2018-05-21T07:01:47.843

1 I have limited samples for one class, unlimited samples for the other class. Need to balance? 2018-09-20T02:38:38.030

1 Downsample GPS track 2018-09-20T12:23:19.847

1 Sample size equation for multi-class distribution 2018-09-24T14:23:20.030

1 Difference between bagging and boosting 2018-10-12T09:38:22.147

1 Generating a set of different scenarios based on some initial observations 2018-10-12T13:54:32.587

1 A few questions to understand a random forest blog 2018-12-03T04:26:17.467

1 SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting? 2018-12-26T12:53:25.643

1 Relation between using stratify and class weights for imbalanced classes 2019-02-04T05:48:02.917

1 How to do k-folds in python whilst splitting into 3 sets? 2019-02-18T16:02:49.950

1 Using SMOTE for Synthetic Data generation to improve performance on unbalanced data 2019-03-13T11:53:23.120

1 Variable Importance changes with oversampling 2019-05-16T06:50:58.583

1 Which machine learning methods can be used to address MonteCarlo sampling problems? 2019-07-09T14:46:50.050

1 Sampling trying to keep as much multivariate variance as possible 2019-09-01T07:35:22.197

1 Adjust predicted probability after smote 2019-11-21T16:59:51.077

1 How to estimate the accuracy on a large dataset? 2020-01-19T04:35:36.720

1 How to compute modulo of a hash? 2020-01-30T07:48:18.920

1 How to resample one dataset to conform to the distribution of another dataset? 2020-02-06T13:24:53.370

1 Dealing with large data: selecting a sample 2020-04-23T00:58:56.367

1 How to draw a sample from data set with respect to a given categorical or numerical variable based on given freely chosen distribution? (Python) 2020-05-22T08:56:15.307

1 Chi Square Test Goodness of Fit 2020-06-20T22:16:35.403

1 Representation sample size- n 2020-07-21T22:43:52.760

1 Uniform convergence garantee on sample complexity 2020-08-05T08:39:57.057

1 Sampling in Text Classification: can the results be considered 'reliable'? 2020-08-09T16:18:05.460

1 Adaptive Sampling Strategies for SVM? 2020-08-16T11:47:33.620

1 Generating artificial data to extend learning set 2020-08-21T15:53:53.337

1 Sequential sampling from Gaussian conditional not working 2020-08-26T07:35:13.210

1 Training a Variational Autoencoder (VAE) for Random Number Generation 2020-08-30T20:06:33.100

1 How should I sample from a mixture distribution? 2020-09-17T16:37:41.217

1 What's the order in applying SMOTE transformation in a pipeline? 2020-09-19T16:27:39.007

1 How to generate a random sample and distribute values based in an probability distribution? 2020-10-09T13:39:15.250