111 Are large data sets inappropriate for hypothesis testing? 2010-09-09T18:21:30.200

95 Locating freely available data samples 2010-07-19T19:15:59.303

85 Essential data checking tests 2011-06-07T08:19:22.500

53 Data APIs/feeds available as packages in R 2011-07-05T14:31:00.900

45 How to simulate data that satisfy specific constraints such as having specific mean and standard deviation? 2012-06-12T11:03:59.350

41 How do I get people to take better care of data? 2010-10-21T16:26:22.880

38 Tiny (real) datasets for giving examples in class? 2011-01-03T22:23:41.990

38 How to draw valid conclusions from "big data"? 2012-02-09T08:30:49.303

30 Free data set for very high dimensional classification 2010-07-29T12:02:28.347

28 Datasets constructed for a purpose similar to that of Anscombe's quartet 2013-12-20T03:18:16.390

24 Visualizing the intersections of many sets 2011-01-13T20:08:50.767

24 What aspects of the "Iris" data set make it so successful as an example/teaching/test data set 2013-11-06T19:03:35.617

23 What do statisticians do that can't be automated? 2012-02-10T07:12:27.330

22 As a reviewer, can I justify requesting data and code be made available even if the journal does not? 2011-08-17T16:52:31.727

18 Social network datasets 2010-11-11T17:50:04.680

18 Good data example needed with covariate affected by treatments 2014-07-28T15:59:30.477

18 Data augmentation techniques for general datasets? 2015-05-23T11:52:55.977

16 Quality assurance and quality control (QA/QC) guidelines for a database 2011-02-21T20:24:52.310

16 What are some good datasets to learn basic machine learning algorithms and why? 2015-08-31T07:48:39.387

15 What is the difference between pooled cross sectional data and panel data? 2012-12-05T23:44:52.723

14 Free public interest data hosting? 2011-04-27T16:39:04.570

14 Calculating the 95th percentile: Comparing normal distribution, R Quantile, and Excel approaches 2011-07-23T01:04:56.450

14 Where to find a large text corpus? 2011-11-24T21:22:19.287

13 Distant supervision: supervised, semi-supervised, or both? 2012-12-29T15:14:47.307

13 How to normalize data between -1 and 1? 2015-10-26T01:02:37.767

12 Best ways to aggregate and analyze data 2010-07-26T19:28:53.083

12 Separating two populations from the sample 2010-07-28T13:53:18.503

12 Fast ways in R to get the first row of a data frame grouped by an identifier 2011-03-04T17:17:28.997

12 Examples of costly consequences from improper use of statistical tools 2011-10-19T14:46:00.223

12 Where to find raw data about clinical trials? 2011-11-23T16:00:42.893

12 What are good datasets to illustrate particular aspects of statistical analysis? 2012-01-22T04:45:18.327

12 R vs Python for Data Analysis 2013-01-03T19:48:49.767

12 Best way to simply store data for statistical analysis in R 2013-08-07T12:38:32.707

12 Why some people test regression-like model assumptions on their raw data and other people test them on the residual? 2013-11-25T09:22:17.463

11 How much information can you mine out of a name? 2011-01-02T17:04:44.293

11 When do we combine dimensionality reduction with clustering? 2011-07-10T01:30:54.663

11 Practical PCA tutorial with data 2012-03-05T11:42:51.607

10 Children's statistical education in different countries? 2011-05-16T19:35:27.120

10 Good books covering data preprocessing and outlier detection techniques 2012-04-11T19:01:58.473

10 What impact does increasing the training data have on the overall system accuracy? 2012-06-27T21:14:37.333

10 What is the most efficient way of training data using least memory? 2012-07-09T16:35:19.170

10 Testing Classification on Oversampled Imbalance Data 2013-05-28T02:17:16.947

10 Best Practices for Creating 'Tidy Data' 2014-01-28T15:14:40.190

10 The idea of making the data have a zero-mean 2014-06-24T10:56:37.240

10 Are data handling errors already 'priced in' to statistical analysis? 2014-12-29T18:48:10.327

10 How to do data augmentation and train-validate split? 2015-10-05T10:43:17.377

10 Is it better to do exploratory data analysis on the training dataset only? 2016-01-07T10:47:06.750

10 What is exactly meant by a "data set"? 2016-11-05T16:35:03.127

9 How to convert a frequency table into a vector of values? 2011-09-15T04:33:58.150

9 Looking for 2D artificial data to demonstrate properties of clustering algorithms 2012-02-16T21:14:21.930

9 How to quantify statistical insignificance? 2012-05-22T01:50:07.847

9 Learning from relational data 2012-10-31T14:06:14.177

9 What algorithm should I use to cluster a huge binary dataset into few categories? 2014-03-11T00:10:17.650

9 Using Regression to project outside of the data range ok? never ok? sometimes ok? 2015-10-15T15:39:00.497

8 Best practices for measuring and avoiding overfitting? 2011-09-15T11:29:34.663

8 Datasets for data visualization examples, teaching and research 2011-09-27T17:41:26.123

8 How to generate nice summary table? 2012-08-07T21:08:25.797

8 Should feature selection be performed only on training data (or all data)? 2013-07-19T12:50:42.327

8 Maximal & closed frequent -- Answer Included 2013-11-23T18:34:18.850

8 Likelihood function of truncated data 2013-11-27T17:55:54.407

8 Good PCA examples for teaching 2013-12-08T19:46:47.523

8 Why is variability measured relative to a point? 2014-04-29T18:27:19.140

8 Should types of data (nominal/ordinal/interval/ratio) really be considered types of variables? 2014-07-09T22:03:16.153

8 Analysis of hamster wheel rotational data 2014-09-04T09:01:08.877

8 Problems with Outlier Detection 2015-02-05T20:58:52.367

8 Is splitting the data into test and training sets purely a "stats" thing? 2017-07-01T23:04:31.037

7 Computer game datasets 2011-03-13T09:58:02.480

7 Weird residuals in linear regression 2011-05-04T11:03:30.273

7 What does this blur around the line mean in this graph? 2011-07-26T05:10:40.910

7 Mining search logs to improve autocomplete suggestions? 2012-03-12T20:05:11.797

7 Data entry tool for sparse table 2012-05-07T14:02:16.613

7 Watermarking data for datamining 2012-08-10T08:46:05.323

7 What do NORB and CIFAR stand for? 2014-07-23T10:32:23.013

7 How do I cite the iris dataset in a paper? 2014-10-12T22:08:27.937

7 Is nominal, ordinal, & binary for quantitative data, qualitative data, or both? 2015-07-04T12:39:56.760

7 What is the mathematically rigorous definition of chunky data? 2016-02-11T22:57:53.390

7 Generating a high-dimensional dataset where nearest neighbor becomes meaningless 2016-12-27T08:55:20.827

6 Need good material on multifractal analysis 2011-01-07T10:27:57.527

6 Reading in SVM files in R (libsvm) 2011-01-31T17:46:43.547

6 Which one should be applied first: data sampling or dimensionality reduction? 2011-02-15T18:20:56.520

6 Example of discontinous effect of x on y dataset (for paper) 2011-03-02T19:32:52.250

6 Data collection and storage for time series analysis 2011-03-29T10:26:51.437

6 Where do I find large face datasets? 2012-02-17T17:12:45.553

6 Benchmark dataset for decision tree algorithm 2012-03-07T16:18:50.437

6 Algorithms for clustering documents by similar words and phrases 2012-03-16T22:43:20.997

6 Where can I find datasets usefull for testing my own Machine Learning implementations? 2012-08-01T15:20:23.660

6 Construct artificial slightly overlapping data for PCA plot 2012-08-24T18:24:11.783

6 Benchmark data for Random Forest evaluation 2013-05-23T21:57:50.703

6 Data Sets suitable for k-means 2013-12-15T16:41:00.997

6 Why do I not get a p-value from this ANOVA in R? 2014-04-17T01:42:10.823

6 How to organise the variable names in R without messing up? 2014-06-12T05:03:31.660

6 Missing data which simply cannot exist 2014-09-08T07:54:42.750

6 What are some interesting examples of wrong or crazy inferences being drawn from Big Data? 2014-09-10T06:02:28.660

6 Optimal Binning with respect to a given response variable 2015-04-29T00:03:13.770

6 What is bucketization? 2015-05-20T19:59:47.477

6 Email and IP String preprocessing for classification task 2015-08-06T12:35:03.693

6 GAM model summary: What is meant by "significance of smooth terms"? 2015-09-25T16:31:41.133

6 How to determine how many variables and what kind of variables a table of data has? 2015-12-04T23:26:51.883

6 Training data is imbalanced - but should my validation set also be? 2017-01-30T02:20:52.917