301 How to understand the drawbacks of K-means 2015-01-16T04:38:13.310

184 Why is Euclidean distance not a good metric in high dimensions? 2014-05-18T17:50:50.803

107 Detecting a given face in a database of facial images 2011-02-14T22:41:09.187

71 How to tell if data is "clustered" enough for clustering algorithms to produce meaningful results? 2011-06-08T00:04:43.590

62 Choosing a clustering method 2010-10-18T15:58:40.990

57 Euclidean distance is usually not good for sparse data? 2012-06-01T13:55:13.253

53 Where to cut a dendrogram? 2010-10-17T21:57:55.460

42 How to decide on the correct number of clusters? 2012-02-09T14:45:19.817

38 Is it possible to do time-series clustering based on curve shape? 2010-10-05T07:45:19.550

36 Clustering with a distance matrix 2010-09-16T11:47:15.633

36 How to do community detection in a weighted social network/graph? 2010-09-21T15:50:45.697

36 Clustering with K-Means and EM: how are they related? 2013-11-18T11:47:06.623

35 Why does k-means clustering algorithm use only Euclidean distance metric? 2014-01-07T11:53:16.713

33 Time series 'clustering' in R 2010-10-01T14:58:01.400

32 Are mean normalization and feature scaling needed for k-means clustering? 2012-01-17T09:55:10.680

32 What is the relation between k-means clustering and PCA? 2015-11-23T22:42:12.097

31 How can an artificial neural network ANN, be used for unsupervised clustering? 2015-03-03T16:21:01.627

29 Clustering a dataset with both discrete and continuous variables 2012-05-10T10:44:37.190

27 How to do dimensionality reduction in R 2010-09-24T11:44:24.637

26 What stop-criteria for agglomerative hierarchical clustering are used in practice? 2010-09-12T19:49:25.443

26 How to interpret mean of Silhouette plot? 2011-05-09T06:05:22.237

26 (Why) Has Kohonen-style SOM fallen out of favor? 2015-10-19T02:36:17.097

25 Dynamic Time Warping Clustering 2015-01-05T15:34:09.797

24 Detecting patterns of cheating on a multi-question exam 2011-03-04T23:19:54.100

24 What is the difference between Multiclass and Multilabel Problem 2011-06-13T05:35:36.353

24 Comparing hierarchical clustering dendrograms obtained by different distances & methods 2013-07-07T07:57:20.793

24 Latent Class Analysis vs. Cluster Analysis - differences in inferences? 2014-10-31T17:54:37.003

22 Clustering a long list of strings (words) into similarity groups 2014-11-07T10:32:56.753

21 Hierarchical clustering with mixed type data - what distance/similarity to use? 2011-09-07T16:18:28.213

21 Performance metrics to evaluate unsupervised learning 2013-12-09T03:00:42.083

21 Is it important to scale data before clustering? 2014-03-12T21:27:17.350

21 How would PCA help with a k-means clustering analysis? 2015-06-18T18:25:27.230

20 How is finding the centroid different from finding the mean? 2013-03-09T01:08:08.787

20 Difference between standard and spherical k-means algorithms 2013-07-07T12:57:39.273

19 Clustering variables based on correlations between them 2010-09-22T17:01:37.580

19 How to define number of clusters in K-means clustering? 2011-03-31T18:29:46.377

19 Clustering of mixed type data with R 2012-03-12T20:55:23.270

19 Comparing clusterings: Rand Index vs Variation of Information 2012-03-20T19:59:15.650

19 Perform K-means (or its close kin) clustering with only a distance matrix, not points-by-features data 2012-07-24T17:02:08.440

19 Supervised clustering or classification? 2012-09-19T14:40:21.963

18 Evaluation measure of clustering (without having truth labels) 2012-01-27T12:43:03.490

18 LSA vs. PCA (document clustering) 2013-07-26T21:56:54.883

18 Why does gap statistic for k-means suggest one cluster, even though there are obviously two of them? 2015-03-06T21:06:03.113

17 Nonparametric Bayesian analysis in R 2010-12-05T11:14:12.273

17 Clustering procedure where each cluster has an equal number of points? 2011-03-24T23:07:21.220

17 What is an acceptable value of the Calinski & Harabasz (CH) criterion? 2013-03-20T17:03:38.740

17 Clustering a binary matrix 2014-02-12T09:48:13.827

16 Assumptions of cluster analysis 2011-03-11T10:34:36.383

16 The input parameters for using latent Dirichlet allocation 2012-03-23T02:33:39.840

16 If k-means clustering is a form of Gaussian mixture modeling, can it be used when the data are not normal? 2013-09-07T02:24:55.857

16 Using correlation as distance metric (for hierarchical clustering) 2015-08-07T20:25:10.220

16 Choosing the right linkage method for hierarchical clustering 2016-02-13T22:09:09.333

16 K-means clustering on the output of t-SNE 2017-02-23T01:39:42.150

15 Is it ok to use Manhattan distance with Ward's inter-cluster linkage in hierarchical clustering? 2011-04-08T07:47:43.787

15 Reason to normalize in euclidean distance measures in hierarchical clustering 2012-06-12T15:49:17.473

15 Is there an R function that will compute the cosine dissimilarity matrix? 2012-07-03T12:30:07.950

15 How to select a clustering method? How to validate a cluster solution (to warrant the method choice)? 2016-02-13T23:19:42.710

15 Does "curse of dimensionality" really exist in real data? 2016-06-17T13:24:30.390

14 Visualization software for clustering 2010-08-09T22:33:40.163

14 Clustering quality measure 2011-01-14T14:06:06.030

14 Clustering: Should I use the Jensen-Shannon Divergence or its square? 2011-02-25T18:01:07.703

14 How to plot data output of clustering? 2011-04-22T02:22:15.400

14 Determine different clusters of 1d data from database 2012-10-15T14:58:49.920

14 Rand index calculation 2014-03-06T14:04:30.123

14 Jenks Natural Breaks in Python: How to find the optimum number of breaks? 2015-03-29T09:59:41.323

14 Text Mining: how to cluster texts (e.g. news articles) with artificial intelligence? 2015-06-07T15:14:54.767

13 Understanding comparisons of clustering results 2011-02-14T00:21:17.413

13 $L_1$ or $L_.5$ metrics for clustering? 2011-06-01T16:42:07.290

13 k-means implementation with custom distance matrix in input 2011-06-30T01:52:27.973

13 What are the "hot algorithms" for machine learning? 2011-10-18T21:24:39.543

13 Appropriate clustering techniques for temporal data? 2012-08-26T20:17:30.467

13 Is there a function in R that takes the centers of clusters that were found and assigns clusters to a new data set 2013-12-02T17:10:19.490

13 Clustering algorithms that operate on sparse data matricies 2014-01-06T16:02:06.290

13 Why are mixed data a problem for euclidean-based clustering algorithms? 2014-10-29T13:02:18.520

13 How to use both binary and continuous variables together in clustering? 2015-01-02T14:55:24.327

13 Dirichlet Processes for clustering: how to deal with labels? 2015-01-27T17:39:46.257

13 How to understand the drawbacks of Hierarchical Clustering? 2015-11-27T12:30:18.187

13 Should dimensionality reduction for visualization be considered a "closed" problem, solved by t-SNE? 2017-03-28T17:45:38.840

12 Clustering probability distributions - methods & metrics? 2011-07-18T07:14:42.463

12 Using statistical significance test to validate cluster analysis results 2011-11-21T14:02:36.537

12 Initializing K-means centers by the means of random subsamples of the dataset? 2012-06-19T06:14:06.640

12 How to measure shape of cluster? 2012-10-09T17:09:01.000

12 Cluster Big Data in R and Is Sampling Relevant? 2013-04-04T20:18:38.777

12 Why do we use k-means instead of other algorithms? 2013-05-13T12:49:21.223

12 How can I group numerical data into naturally forming "brackets"? (e.g. income) 2013-08-15T19:50:59.520

12 Is there a decision-tree-like algorithm for unsupervised clustering? 2014-06-11T12:21:49.053

12 k-means vs k-median? 2014-07-27T10:37:22.033

12 Are there any non-distance based clustering algorithms? 2014-12-31T06:49:41.850

12 Clustering a correlation matrix 2015-02-19T11:14:28.227

12 With categorical data, can there be clusters without the variables being related? 2016-06-13T02:05:25.757

11 Clustering (k-means, or otherwise) with a minimum cluster size constraint 2010-12-10T20:53:42.207

11 Clustering spatial data in R 2011-04-19T13:16:03.780

11 When do we combine dimensionality reduction with clustering? 2011-07-10T01:30:54.663

11 SOM clustering for nominal/circular variables 2011-07-19T07:34:22.517

11 Clustering 1D data 2011-08-03T02:36:19.330

11 Can I use PCA to do variable selection for cluster analysis? 2011-10-13T20:29:18.337

11 Why doesn't k-means give the global minimum? 2013-01-29T07:16:12.207

11 Can you compare different clustering methods on a dataset with no ground truth by cross-validation? 2014-02-19T00:59:02.597

11 State-of-the-art in deduplication 2014-03-22T19:52:17.633