1

According to my lecture, Fuzzy c-Means tries to minimize the following objective function:

$$J(X,B,U)=\sum_{i=1}^c\sum_{j=1}^n u_{ij}^w \, d^2(\vec{\beta_i},\vec{x_j})$$

where $X$ are the data points, $B$ are the cluster-'prototypes', and $U$ is the matrix containing the fuzzy membership degrees. $d$ is a distance measure.

A constraint is that the membership degrees for a single datapoint w.r.t. all clusters sum to $1$: $\sum_{j=1}^n\, u_{ij}=1$.

Now in the first equation, what is the role of the $w$? I read that one could use any convex function instead of $(\cdot)^w$. But why use anything at all. Why don't we just use the membership degrees? My lecture says using the fuzzifier is necessary but doesn't explain why.

Using the Fuzzy c-mean algorithm for image recognition can't be recommended anymore. In comparison to process the training data with a neural network the accuracy is lower the processsing speed in most cases too. The weight coefficient for the features are difficult to determine and this results into failed projects. – Manuel Rodriguez – 2019-07-23T17:32:25.180

@ManuelRodriguez Thank you for the info. I didn't intend to apply the algorithm though, I just want to understand it. – user9007131 – 2019-07-23T17:34:45.060