301 How to understand the drawbacks of K-means 2015-01-16T04:38:13.310

187 What is the difference between data mining, statistics, machine learning and AI? 2010-11-30T11:26:15.473

108 Obtaining knowledge from a random forest 2012-01-16T11:09:29.237

96 Cohen's kappa in plain English 2014-01-13T19:14:38.847

71 Having a job in data-mining without a PhD 2012-05-01T23:39:27.387

66 Skills hard to find in machine learners? 2014-06-24T07:11:36.400

64 Gradient Boosting Tree vs Random Forest 2015-09-20T20:44:06.297

57 Euclidean distance is usually not good for sparse data? 2012-06-01T13:55:13.253

52 Software needed to scrape data from graph 2011-08-18T04:14:22.583

51 Is sampling relevant in the time of 'big data'? 2012-09-09T19:58:14.343

49 Why only three partitions? (training, validation, test) 2011-04-08T14:45:04.197

49 Do we have a problem of "pity upvotes"? 2011-06-01T01:57:42.547

38 How to draw valid conclusions from "big data"? 2012-02-09T08:30:49.303

36 Clustering with K-Means and EM: how are they related? 2013-11-18T11:47:06.623

34 What are the differences between hidden Markov models and neural networks? 2011-12-31T21:03:35.660

34 Think like a bayesian, check like a frequentist: What does that mean? 2016-08-16T13:33:23.400

32 Data mining: How should I go about finding the functional form? 2011-05-05T16:26:00.037

31 Are there statistical lessons from the "Bible Code" episode 2011-01-17T09:18:09.130

30 How to interpret the output of the summary method for an lm object in R? 2013-05-17T00:02:04.417

28 What math subjects would you suggest to prepare for data mining and machine learning? 2013-08-30T17:30:54.223

27 Statistics and data mining software tools for dealing with large datasets 2010-10-14T10:28:15.937

26 Lift measure in data mining 2011-10-17T14:53:50.043

23 What is the daily job routine of the machine learning scientist? 2014-07-24T14:14:08.353

22 Why are p-values misleading after performing a stepwise selection? 2015-11-03T09:04:56.567

21 First step for big data ($N = 10^{10}$, $p = 2000$) 2012-04-16T19:43:03.990

21 New revolutionary way of data mining? 2012-07-02T13:57:36.987

21 Performance metrics to evaluate unsupervised learning 2013-12-09T03:00:42.083

20 "Interestingness" function for StackExchange questions 2011-05-03T21:53:26.910

20 Difference between standard and spherical k-means algorithms 2013-07-07T12:57:39.273

20 What is the difference between a loss function and decision function? 2014-06-27T09:00:41.720

20 Relative variable importance for Boosting 2015-07-19T13:29:17.233

19 Cross Validation (error generalization) after model selection 2011-01-03T15:08:29.897

19 Programmer looking to break into machine learning field 2012-04-07T20:52:58.247

19 Perform K-means (or its close kin) clustering with only a distance matrix, not points-by-features data 2012-07-24T17:02:08.440

19 Where and why does deep learning shine? 2014-02-17T13:40:10.717

18 Data Mining and Statistical Analysis 2010-08-11T05:31:50.527

18 LSA vs. PCA (document clustering) 2013-07-26T21:56:54.883

17 How to predict when the next event occurs, based on times of previous events? 2011-09-30T20:26:34.607

16 When is interactive data visualization useful to use? 2011-02-10T14:49:20.220

16 What is the practical difference between association rules and decision trees in data mining? 2012-12-03T10:07:33.557

16 Under which conditions do gradient boosting machines outperform random forests? 2013-07-23T15:08:59.490

16 If k-means clustering is a form of Gaussian mixture modeling, can it be used when the data are not normal? 2013-09-07T02:24:55.857

15 A survey of data-mining software tools 2010-08-22T03:24:28.493

15 Are decision trees almost always binary trees? 2011-06-21T21:29:43.933

15 training approaches for highly-imbalanced data set 2012-11-06T21:28:43.453

15 Biased Data in Machine Learning 2017-09-05T13:05:25.813

13 Flowcharts to help selecting the proper analysis technique and test 2010-08-25T20:09:26.997

13 How to begin reading about data mining? 2011-07-11T13:16:03.470

13 What is data blending? 2012-07-19T10:22:50.957

13 Distant supervision: supervised, semi-supervised, or both? 2012-12-29T15:14:47.307

13 How far will self study get me? 2013-01-12T16:33:31.220

13 The difference between logistic regression and support vector machines? 2015-03-08T22:55:21.433

13 Boosting: why is the learning rate called a regularization parameter? 2015-08-25T10:39:16.160

12 Best ways to aggregate and analyze data 2010-07-26T19:28:53.083

12 Exploratory analysis of spatio-temporal forecast errors 2011-02-03T08:58:53.177

12 Bagging with oversampling for rare event predictive models 2011-08-31T18:13:26.723

12 Why do we use k-means instead of other algorithms? 2013-05-13T12:49:21.223

12 Are there any non-distance based clustering algorithms? 2014-12-31T06:49:41.850

12 Negative binomial distribution vs binomial distribution 2015-10-08T10:53:57.727

11 How much information can you mine out of a name? 2011-01-02T17:04:44.293

11 Practical PCA tutorial with data 2012-03-05T11:42:51.607

11 Mathematics base for data mining and artificial intelligence algorithms 2012-08-17T07:27:49.003

11 Term frequency/inverse document frequency (TF/IDF): weighting 2013-12-02T16:49:52.647

11 Variational Inference in plain english 2017-02-12T15:00:38.760

10 Dubious use of signal processing principles to identify a trend 2010-07-22T11:31:16.777

10 Documented/reproducible examples of successful real-world applications of econometric methods? 2011-02-08T02:24:03.660

10 Machine learning self-learning book? 2011-12-20T14:37:41.220

10 First quick glance at a dataset 2012-01-27T13:51:19.603

10 How can I find correlations between crashes and system environments? 2011-10-20T16:36:50.047

10 How can I group strings by common themes? 2012-03-13T22:53:10.143

10 Regarding using bigram (N-gram) model to build feature vector for text document 2012-04-02T14:02:09.500

10 Good books covering data preprocessing and outlier detection techniques 2012-04-11T19:01:58.473

10 Where did the term "learn a model" come from 2012-09-10T11:25:49.373

10 Data mining techniques in Obama's campaign 2013-01-11T16:16:24.143

10 Model performance in quantile modelling 2013-03-17T00:46:40.517

10 Is there overfitting in this modellng approach 2013-04-22T15:07:38.837

10 What are good metrics to assess the quality of a PCA fit, in order to select the number of components? 2014-05-27T07:40:41.980

10 The idea of making the data have a zero-mean 2014-06-24T10:56:37.240

9 Video lectures about data mining? 2011-07-14T06:02:16.010

9 Apriori algorithm in plain English? 2011-10-04T07:01:46.337

9 How to quickly select important variables from a very large dataset? 2011-10-04T15:33:18.273

9 Finding suitable rules for new data using arules 2012-01-19T16:22:24.070

9 Clustering as a means of splitting up data for logistic regression 2012-06-30T17:16:04.693

9 Determining largest contributor in a group 2012-10-04T20:54:16.077

9 Does preclustering help to build a better predictive model? 2012-10-12T12:09:25.227

9 Mathematics behind classification and regression trees 2012-11-25T18:34:01.440

9 Are Random Forest and Boosting parametric or non-parametric? 2015-04-21T18:52:06.720

9 How to make sure that a machine learning algorithm's implementation is correct? 2015-05-13T11:19:23.957

8 Getting started with biclustering 2011-02-20T12:13:24.220

8 Best practices for measuring and avoiding overfitting? 2011-09-15T11:29:34.663

8 Algorithms and methods for attribute/feature selection? 2010-07-10T18:22:09.473

8 What can I do beyond Pearson correlation? 2011-10-27T11:22:38.510

8 Data mining papers/examples 2012-02-03T20:32:11.433

8 Use of the Gamma parameter with support vector machines 2012-09-20T15:31:29.630

8 Regarding precision and recall for the highly unbalanced validation data set 2012-10-30T19:44:53.947

8 How does PCA improve the accuracy of a predictive model? 2013-04-03T08:02:55.457

8 Detecting clusters in a binary sequence 2013-04-04T17:29:29.030

8 Maximal & closed frequent -- Answer Included 2013-11-23T18:34:18.850

8 Meaning of latent features? 2014-07-15T17:12:53.037