Tag: dataset

125 Publicly Available Datasets 2014-05-18T18:45:38.957

30 Should I go for a 'balanced' dataset or a 'representative' dataset? 2014-07-22T12:29:10.050

22 Data Science Project Ideas 2014-07-25T18:36:31.340

20 Quick guide into training highly imbalanced data sets 2014-09-12T15:20:51.767

18 Publicly available social network datasets/APIs 2014-06-17T05:29:11.830

15 Dataset for Named Entity Recognition on Informal Text 2014-06-30T21:02:05.053

12 How to generate synthetic dataset using machine learning model learnt with original dataset? 2015-04-01T15:23:17.997

11 Analyzing A/B test results which are not normally distributed, using independent t-test 2014-08-04T22:27:10.837

10 Datasets understanding best practices 2014-06-24T07:29:57.787

10 Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations? 2015-05-17T20:12:48.760

10 is there any data tidying tool for python/pandas similar to R tidyr tool? 2016-03-02T08:54:10.503

8 Where can I download historical market capitalization and daily turnover data for stocks? 2014-06-25T18:06:14.293

8 Network analysis classic datasets 2014-06-26T13:32:18.050

8 How to model user's buying behavior on Amazon? 2015-11-05T17:06:27.647

8 Why are variables of train and test data defined using the capital letter (in Python)? 2017-03-15T07:36:40.437

7 Suggest text classifier training datasets 2014-06-18T16:21:12.203

7 NASDAQ Trade Data 2014-07-19T20:46:52.740

7 Best way to store large data set using R from Twitter? 2015-06-18T18:23:07.763

7 How to split train/test in recommender systems 2015-08-17T20:34:15.330

7 Difference between training and test data distributions 2015-09-25T00:47:46.873

7 Covariate shift detection 2015-10-02T09:49:55.943

7 With unbalanced class, do I have to use under sampling on my validation/testing datasets? 2015-11-18T20:14:30.133

7 Benchmark datasets for collaborative filtering 2016-03-23T13:46:49.460

7 Python: Handling imbalance Classes in python Machine Learning 2016-04-25T07:26:53.743

7 why we need to handle data imbalance? 2017-11-06T06:15:29.570

6 Techniques for trend extraction from unbalanced panel data 2014-06-19T19:33:40.620

6 Job title similarity 2014-07-21T09:00:04.917

6 Where can I find free spatio-temporal dataset for download? 2014-08-19T03:41:24.207

6 Matrix properties and machine learning/data mining 2014-10-30T18:22:18.907

6 Interactive Graphing while logging data 2014-12-17T21:17:13.340

6 Downloading a large dataset on the web directly into AWS S3 2015-04-22T18:00:27.083

6 Working with inaccurate (incorrect) dataset 2015-06-24T16:36:32.730

6 Classifier and Technique to use for large number of categories 2015-09-26T11:58:37.963

6 For which real world data sets does DBSCAN surpass K-means.? 2016-02-02T08:36:55.023

6 How do you calculate how dense or sparse a dataset is? 2016-03-07T19:39:18.703

6 How much data are sufficient to train my machine learning model? 2017-06-26T21:26:04.680

6 Can HDF5 be reliably written to and read from simultaneously by separate python processes? 2017-08-17T11:59:57.020

5 Amalgamating multiple datasets with different variables coding 2014-10-02T06:18:11.397

5 Evaluating Recommendation engines 2014-11-26T04:40:17.840

5 What kinds of data other than geographical are topologically spherical? 2015-06-23T08:36:13.723

5 Anonymizing Datasets 2015-07-30T07:16:07.637

5 Random Forest Regression. How to represent really long list of categories for processing 2015-12-14T16:58:41.163

5 Reducing sample size 2016-12-27T19:28:17.920

5 How to generate bulk graphics using R 2017-02-25T13:40:25.847

5 Nested features with one to many relationships 2017-05-24T19:46:03.913

5 Linear regression on probabilistic data 2017-06-26T12:32:09.107

5 How do we make data Obfuscate or "De-identificate" to make it anonymous and share it publicly? 2017-08-05T14:40:09.643

5 Missing Values in Data 2017-08-31T10:08:51.103

5 In what data science applications has the stack exchange dump been used? 2017-09-12T08:59:14.717

4 Handling huge dataset imbalance (2 class values) and appropriate ML algorithm 2014-08-07T10:45:38.557

4 Reduction of multiple answers to single variable 2014-11-18T09:07:00.867

4 API for Company Data Enrichment Suggestions 2014-12-01T20:10:29.967

4 Where did this NY Times op-ed get his Google Search data? 2015-01-26T15:45:51.317

4 Data available from industry operations 2015-01-30T23:39:04.687

4 What types of features are used in a large-scale click-through rate prediction problem? 2015-04-30T12:26:47.637

4 Bechmark for Movielens 2015-08-19T18:15:00.463

4 One Hot encoding for large number of values 2015-10-03T18:37:16.597

4 Where can I get labels for small ImageNet? 2016-08-15T18:13:56.187

4 How to interpret the loading values of a pca? 2016-09-30T14:58:16.003

4 How to transform an imbalanced attribute to make it more suitable for linear regression? 2016-10-14T23:38:07.803

4 Plotting different values in pandas histogram with different colors 2016-11-10T12:09:42.713

4 Number of features vs. number of samples : if small sample size is sufficient, why take large number of samples? 2017-05-05T18:44:39.317

4 How to train an image dataset in TensorFlow? 2017-08-09T11:01:49.043

4 One hot encoding alternatives for large categorical values? 2017-11-14T17:20:58.253

4 What does BNG stand for 2018-01-17T15:53:38.397

4 Should I prevent augmented data to leak to the test/cross validation sets 2018-01-19T04:05:24.083

3 R aggregate() with dates 2014-08-01T18:13:06.063

3 Query similarity: how much data is used in practice? 2014-08-19T18:59:03.013

3 Data sets for evaluating text retrieval quality 2014-09-05T14:47:52.127

3 Trouble representing a problem 2014-09-23T20:58:01.027

3 Single Layer Perceptron with three classes 2014-10-11T23:26:53.197

3 MovieLens data set 2014-10-22T14:53:42.127

3 Pre-processing (center, scale, impute) among training sets (different forms) and the test set - what is a good approach? 2015-01-29T13:54:24.940

3 Looking for smallest set of rows that form a natural key in a data set 2015-05-18T05:59:41.207

3 What are the basic approaches for balancing a dataset for machine learning? 2015-06-24T11:23:03.077

3 How many observations in a neural networks dataset? 2015-06-24T15:46:52.680

3 How can I weigh observations differently that were provided for a time horizon? 2015-11-30T14:32:56.113

3 Handling a feature with multiple categorical values for the same instance value 2015-12-15T05:54:18.307

3 Tools to perform SQL analytics on 350TB of csv data 2016-01-07T02:33:51.253

3 Determining completeness of dataset 2016-02-25T16:13:17.880

3 How to classify support call texts? 2016-04-20T21:05:09.387

3 When should we consider a dataset as imbalanced? 2016-05-16T11:36:14.850

3 Technical name for this data wrangling process? Multiple columns into multi-factor single column 2016-06-07T01:42:41.457

3 How to handle Memory issues in training Word Embeddings on Large Datasets? 2016-06-07T18:37:51.627

3 problem loading data into R 2016-06-23T20:38:12.440

3 Are SAS Data Storage Options designed for Big Data? 2016-07-05T21:25:30.557

3 Training and cross validation error curves 2016-09-26T06:12:26.650

3 How is the evaluation setup for YouTube faces of FaceNet? 2016-11-18T13:30:56.937

3 The best way to calculate variations between 2 datasets? 2016-12-07T16:50:08.260

3 Generating Image data sets for training CNNs 2017-01-16T15:48:04.103

3 Training data set for food image recognition 2017-01-30T16:42:51.820

3 What are some of the best practices for sharing data and models with colleagues? 2017-03-17T18:45:16.867

3 I want to add demographic data to a data set. Any suggestions on where to find zip code level data? 2017-03-28T21:04:07.547

3 Predicting hardware failures with limited data 2017-04-07T10:49:26.163

3 I have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so? 2017-04-11T21:14:18.453

3 How to use a dataset where attribute names are changed? 2017-06-11T02:39:27.017

3 Detecting spammers with artificially generated target class 2017-08-20T19:35:34.767

3 Should we use discrete or continuous input for decision trees 2017-09-01T15:02:30.823

3 Public dataset for news articles with their associated categories 2017-09-26T08:56:30.433