Do Clustering algorithms need feature scaling in the pre-processing stage?



Is feature scaling useful for clustering algorithms? What type of features, I mean numeric, categorical etc., are most efficient for clustering?


Posted 2017-09-03T14:55:47.560

Reputation: 221



Clustering algorithms are certainly effected by the feature scaling.


Let's say that you have two features:

  1. weight (in Lbs)
  2. height (in Feet)

... and we are using these to predict whether a person needs a 'S' or 'L' size shirt.

We are using weight+height for that, and in our trained set let's say we have two people already in clusters:

  1. Adam (175Lbs+5.9ft) in 'L'
  2. Lucy (115Lbs+5.2ft) in 'S'.

We have a new person - Alan (140Lbs+6.1ft.), and your clustering algo will put it in the cluster which is nearest. So, if we don't scale the features here, the height is not having much effect and Alan will be allotted in 'S' cluster.

So, we need to scale it. Scikit Learn provides many functions for scaling. One you can use is sklearn.preprocessing.MinMaxScaler.


Posted 2017-09-03T14:55:47.560

Reputation: 66

The nearest cluster is defined by the distance. If scaling matters depends on the distance measure used. For example, correlation is not affected by scaling. – Pieter – 2017-09-12T10:48:47.710


Yes. Clustering algorithms such as K-means do need feature scaling before they are fed to the algo. Since, clustering techniques use Euclidean Distance to form the cohorts, it will be wise e.g to scale the variables having heights in meters and weights in KGs before calculating the distance.

Sumit Asthana

Posted 2017-09-03T14:55:47.560

Reputation: 49

1The choice of distance measure is often up to the user, other common options are block distance, gaussian (kernel) distance (e.g. GMM) and correlations. – Pieter – 2017-09-12T10:51:01.737


In fact, most clustering algorithms are even highly sensitive to scaling. Rescaling the data can completely ruin the results.

Bad scaling also appears to be a key reason why people fail with finding meaningful clusters. It is just very easy to do badly.

By no means rely on automatic scaling. It must fit your task and data. Preprocessing is an art, and will require most of the work.

Non-continuous variables are big issue. While you can "hack" data into binary encodings and then pretend they are suitable, the discreteness poses a major issue for the algorithms. For example, many points have the same distance. And the mean of such a variable doesn't make a lot of semantic sense anymore. The squared deviation (as used by k-means) is even worse. Results may often be better if you ignore such variables when clustering.

Same goes for bad attributes, such as identifiers, sequence numbers, etc.

Has QUIT--Anony-Mousse

Posted 2017-09-03T14:55:47.560

Reputation: 7 331


Scaling affects Clustering Results in a way that depends by the metric used (Euclidean Distance, Squared Euclidean Distance, Manhattan Distance, …)

In general when you are mixing features which have different physical measurements units, you can think of a Linear Transformation (i.e. Offset + Scale Factor) to transform them into a common space

You can also think to try learning this transformation, using some more complicated model than a linear one (if you think your problem requires it)

Nicola Bernini

Posted 2017-09-03T14:55:47.560

Reputation: 221


Feature scaling will certainly effect clustering results. Exactly what scaling to use is an open question however, since clustering is really an exploratory procedure rather than something with a ground truth you can check against. Ultimately you want to use your knowledge of the data to determine how to relatively scale features. In practice clustering is going to work best with numeric features, so if you have a mix of feature types you may want to look at embedding methods such as GLRM as a preprocessing step.

Leland McInnes

Posted 2017-09-03T14:55:47.560

Reputation: 311


There are different clustering algorithms. Without knowing every one well, I would assume it may vary.

One very popular clustering algorithm is k-means and this one usually needs scaling since

"K-means clustering is "isotropic" in all directions of space and therefore tends to produce more or less round (rather than elongated) clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance, so clusters will tend to be separated along variables with greater variance."

For more see:


Posted 2017-09-03T14:55:47.560

Reputation: 101