What is the role of the 'fuzzifier' w in Fuzzy Clustering?


According to my lecture, Fuzzy c-Means tries to minimize the following objective function:

$$J(X,B,U)=\sum_{i=1}^c\sum_{j=1}^n u_{ij}^w \, d^2(\vec{\beta_i},\vec{x_j})$$

where $X$ are the data points, $B$ are the cluster-'prototypes', and $U$ is the matrix containing the fuzzy membership degrees. $d$ is a distance measure.

A constraint is that the membership degrees for a single datapoint w.r.t. all clusters sum to $1$: $\sum_{j=1}^n\, u_{ij}=1$.

Now in the first equation, what is the role of the $w$? I read that one could use any convex function instead of $(\cdot)^w$. But why use anything at all. Why don't we just use the membership degrees? My lecture says using the fuzzifier is necessary but doesn't explain why.


Posted 2019-07-23T15:18:19.450

Reputation: 53

Using the Fuzzy c-mean algorithm for image recognition can't be recommended anymore. In comparison to process the training data with a neural network the accuracy is lower the processsing speed in most cases too. The weight coefficient for the features are difficult to determine and this results into failed projects. – Manuel Rodriguez – 2019-07-23T17:32:25.180

@ManuelRodriguez Thank you for the info. I didn't intend to apply the algorithm though, I just want to understand it. – user9007131 – 2019-07-23T17:34:45.060



Its not required, you can have $m=1$, actually it can be any number $\geq 1$.

Now the better question is why to have it? The answer is that it adds a smoothing effect. Lets look at it in each of the limits ($\lim m \rightarrow 1$ and $\lim m \rightarrow \infty$)

Towards $\infty$, it makes $u_{ij}$ equal to $\frac{1}{c}$, making each point have equal membership of each class regardless of $m$. From the optimization perspective, its saying how can we achieve finding clusters that are closest to all points, therefore by definition it has already achieved that, and so the Loss will always be 0. (at its global minimum)

Now in the other limit, the constants are inversely proportional to the square of the normalized euclidean distance. This makes intuitive sense, the membership is high if they are close, and the membership is low if they are not (relatively)

So why do we have the $m$, its for control. It allows us to choose and experiment with how heavy each distance should hold weight in the membership. An example where a larger $m$ may be useful is when the data isnt clean, and you dont want to rely so heavily on euclidean distance as the membership, so you forcibly add in a smoothing effect


Posted 2019-07-23T15:18:19.450

Reputation: 1 845

Assume I have two clusters, then for a point $x_1$ the $u_{ij}$ must sum to $1$. So I could have for example $u_{11}=0.8$ and $u_{12}=0.2$. If I now let $m\rightarrow\infty$, then $0.8^\infty=0$ and $0.2^\infty=0$, not $\frac{1}{2}$. How did you come up with the $\frac{1}{C}$? – user9007131 – 2019-07-23T19:46:45.723

The u’s are a function of m as well – mshlis – 2019-07-23T20:11:45.493

I don't understand. It's still $0.8^m \rightarrow 0$ and not $0.8^m\rightarrow\frac{1}{2}$ for $m\rightarrow\infty$ – user9007131 – 2019-07-24T17:29:09.153

the value is still 0, the membership though is $\frac{1}{2}$... If $m \rightarrow \infty$ the problem is meaningless because the membership is no longer a function of any of the cetroids but is gauranteed to be equally associated with each – mshlis – 2019-07-24T17:30:54.417

i adjusted my answer slightly to reflect that – mshlis – 2019-07-24T17:32:49.050