190 Publicly Available Datasets 2014-05-18T18:45:38.957

51 Should I go for a 'balanced' dataset or a 'representative' dataset? 2014-07-22T12:29:10.050

35 Is it always better to use the whole dataset to train the final model? 2018-06-12T09:54:16.347

34 Quick guide into training highly imbalanced data sets 2014-09-12T15:20:51.767

28 Why is it wrong to train and test a model on the same dataset? 2020-12-13T14:11:58.530

26 Publicly available social network datasets/APIs 2014-06-17T05:29:11.830

25 Data Science Project Ideas 2014-07-25T18:36:31.340

22 Is there any data tidying tool for python/pandas similar to R tidyr tool? 2016-03-02T08:54:10.503

20 How to generate synthetic dataset using machine learning model learnt with original dataset? 2015-04-01T15:23:17.997

20 Uploading images folder from my system into Google Colab 2018-03-23T18:52:28.867

19 Dataset for Named Entity Recognition on Informal Text 2014-06-30T21:02:05.053

15 With unbalanced class, do I have to use under sampling on my validation/testing datasets? 2015-11-18T20:14:30.133

15 Why are variables of train and test data defined using the capital letter (in Python)? 2017-03-15T07:36:40.437

15 How much data are sufficient to train my machine learning model? 2017-06-26T21:26:04.680

14 Analyzing A/B test results which are not normally distributed, using independent t-test 2014-08-04T22:27:10.837

14 why we need to handle data imbalance? 2017-11-06T06:15:29.570

14 One hot encoding alternatives for large categorical values 2017-11-14T17:20:58.253

14 Is there a person class in ImageNet? Are there any classes related to humans? 2018-02-11T08:21:22.517

13 Datasets understanding best practices 2014-06-24T07:29:57.787

12 Where can I download historical market capitalization and daily turnover data for stocks? 2014-06-25T18:06:14.293

12 Downloading a large dataset on the web directly into AWS S3 2015-04-22T18:00:27.083

12 Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations? 2015-05-17T20:12:48.760

12 When should we consider a dataset as imbalanced? 2016-05-16T11:36:14.850

12 Loading own train data and labels in dataloader using pytorch? 2019-02-20T21:13:45.157

10 Network analysis classic datasets 2014-06-26T13:32:18.050

10 NASDAQ Trade Data 2014-07-19T20:46:52.740

10 Benchmark datasets for collaborative filtering 2016-03-23T13:46:49.460

10 Python: Handling imbalance Classes in python Machine Learning 2016-04-25T07:26:53.743

10 Can HDF5 be reliably written to and read from simultaneously by separate python processes? 2017-08-17T11:59:57.020

10 How to store strings in CSV with new line characters? 2018-07-22T14:31:18.693

9 Suggest text classifier training datasets 2014-06-18T16:21:12.203

9 Interactive Graphing while logging data 2014-12-17T21:17:13.340

9 How to split train/test in recommender systems 2015-08-17T20:34:15.330

9 Covariate shift detection 2015-10-02T09:49:55.943

9 How to model user's buying behavior on Amazon? 2015-11-05T17:06:27.647

9 How do you calculate how dense or sparse a dataset is? 2016-03-07T19:39:18.703

9 How can I get the ImageNet ILSVRC 2012 data used for the classification challenge? 2016-09-05T15:52:30.737

9 What are some of the best practices for sharing data and models with colleagues? 2017-03-17T18:45:16.867

9 Public dataset for news articles with their associated categories 2017-09-26T08:56:30.433

9 how to check the distribution of the training set and testing set are similar 2019-04-18T11:22:01.990

9 Is (nearly) all data separable? 2020-01-01T17:03:52.303

8 Job title similarity 2014-07-21T09:00:04.917

8 Matrix properties and machine learning/data mining 2014-10-30T18:22:18.907

8 Evaluating Recommendation engines 2014-11-26T04:40:17.840

8 Best way to store large data set using R from Twitter? 2015-06-18T18:23:07.763

8 One Hot encoding for large number of values 2015-10-03T18:37:16.597

8 Do I have to standardize my new polynomial features? 2015-11-25T11:11:25.923

8 What is normalization for? 2018-05-06T16:44:04.660

8 Where can I find freely available multi-label datasets online? 2018-07-01T22:50:11.067

8 Splitting train/test sets by an identifier? 2019-05-03T22:42:39.580

7 Where can I find free spatio-temporal dataset for download? 2014-08-19T03:41:24.207

7 What kinds of data other than geographical are topologically spherical? 2015-06-23T08:36:13.723

7 Difference between training and test data distributions 2015-09-25T00:47:46.873

7 For which real world data sets does DBSCAN surpass K-means.? 2016-02-02T08:36:55.023

7 Where can I get labels for small ImageNet? 2016-08-15T18:13:56.187

7 How do we make data Obfuscate or "De-identificate" to make it anonymous and share it publicly? 2017-08-05T14:40:09.643

7 Meaning of stratify parameter 2018-11-01T18:32:39.760

7 On a multi lingual sentiment corpus 2018-11-18T17:41:42.027

6 Working with inaccurate (incorrect) dataset 2015-06-24T16:36:32.730

6 Bechmark for Movielens 2015-08-19T18:15:00.463

6 Classifier and Technique to use for large number of categories 2015-09-26T11:58:37.963

6 How to handle Memory issues in training Word Embeddings on Large Datasets? 2016-06-07T18:37:51.627

6 Should I prevent augmented data to leak to the test/cross validation sets 2018-01-19T04:05:24.083

6 Why will the accuracy of a highly unbalanced dataset reduce after oversampling? 2018-02-23T08:51:12.860

6 Always drop the first column after performing One Hot Encoding? 2018-02-27T12:28:35.403

6 Why is a correlation matrix symmetric? 2018-05-07T17:31:01.273

6 Git for Deep Learning - what are the best tools for versioning/tracking machine learning experiments? 2018-08-02T06:51:32.400

6 When to use missing data imputation in the data analysis problem? 2019-08-11T22:39:52.013

6 Is it possible to use a generative model to "share" private data? 2020-03-04T15:39:01.283

5 Techniques for trend extraction from unbalanced panel data 2014-06-19T19:33:40.620

5 Amalgamating multiple datasets with different variables coding 2014-10-02T06:18:11.397

5 API for Company Data Enrichment Suggestions 2014-12-01T20:10:29.967

5 Anonymizing Datasets 2015-07-30T07:16:07.637

5 Random Forest Regression. How to represent really long list of categories for processing 2015-12-14T16:58:41.163

5 Training and cross validation error curves 2016-09-26T06:12:26.650

5 Nested features with one to many relationships 2017-05-24T19:46:03.913

5 Linear regression on probabilistic data 2017-06-26T12:32:09.107

5 What is the meaning of spherical dataset? 2017-08-07T08:51:20.467

5 Missing Values in Data 2017-08-31T10:08:51.103

5 In what data science applications has the stack exchange dump been used? 2017-09-12T08:59:14.717

5 How can l get 50 % examples in training set and 50% in test set for each class when splitting data? 2017-11-24T10:15:36.180

5 What does it mean for the training data to be generated by a probability distribution over datasets 2017-12-26T06:29:27.790

5 Memory error on using data generator in keras 2018-05-31T04:04:29.447

5 High RMSE and MAE and low MAPE 2018-08-20T05:26:19.593

5 What is the largest public wearable accelerometer dataset? 2018-12-05T06:59:33.943

5 Data Visualization with multiple dimension, and linear separability 2019-03-23T20:41:10.027

5 Data enrichment of geographical records 2019-05-04T15:24:53.587

5 XGBoost, binary classification: uneven number of observations per user 2019-07-22T18:01:31.530

5 Discrimination vs Calibration - Machine Learning Models 2020-03-13T07:49:59.740

5 Machine learning methods on 1 feature dataset 2020-05-20T10:22:21.653

5 What are bias and variance in machine learning? 2020-08-12T08:10:53.703

5 Loading collections of datasets - Python code examples 2020-10-19T19:32:06.590

5 Can all data be represented tabularly? 2020-11-06T16:36:13.900

4 Handling huge dataset imbalance (2 class values) and appropriate ML algorithm 2014-08-07T10:45:38.557

4 How to classify and cluster this time series data 2014-09-28T12:51:43.823

4 Single Layer Perceptron with three classes 2014-10-11T23:26:53.197

4 MovieLens data set 2014-10-22T14:53:42.127

4 Large categorical dataset for regression 2014-11-10T09:45:13.063

4 Reduction of multiple answers to single variable 2014-11-18T09:07:00.867