What is a good explanation of Non Negative Matrix Factorization?



I am trying to find a resource to understand non-negative matrix factorization. Apart from Wikipedia, I couldn't find anything useful.


Posted 2016-02-18T04:25:38.627

Reputation: 31

Do you understand matrix completion through factorization without the non-negative part? – Emre – 2016-02-18T05:02:03.187

1There are a quantity of ressources, more or less complicated. Please tell us what you have found, what lacks for your understanding. In other words, can you factorize "useful" in a non-negative way? – Laurent Duval – 2016-02-18T06:39:52.403

this is a good read, https://arxiv.org/pdf/1401.5226.pdf

– dermen – 2018-11-04T18:15:45.383



Non-Negative Matrix Factorization (NMF) is described well in the paper by Lee and Seung, 1999.

Simply Put

NMF takes as an input a term-document matrix and generates a set of topics that represent weighted sets of co-occurring terms. The discovered topics form a basis that provides an efficient representation of the original documents.

About NMF

NMF is used for feature extraction and is generally seen to be useful when there are many attributes, particularly when the attributes are ambiguous or are not strong predictors. By combining attributes NMF can display patterns, topics, or themes which have importance.

In practice, one encounters NMF typically where text is involved. Consider an example, where the same word (love) in a document could different meanings:

  1. I love lettuce wraps.
  2. I love the way I feel when I'm on vacation in Mexico.
  3. I love my dog, Euclid.
  4. I love being a Data Scientist.

In all 4 cases, the word 'love' is used, but it has a different meaning to the reader. By combining attributes, NMF introduces context which creates additional predictive power.

$"love" + "lettuce \ wraps" \ \Rightarrow \ "pleasure \ by \ food"$ $"love" + "vacation \ in \ Mexico" \ \Rightarrow \ "pleasure \ by \ relaxation"$ $"love" + "dog" \ \Rightarrow \ "pleasure \ by \ companionship"$ $"love" + "Data \ Scientist" \ \Rightarrow \ "pleasure \ by \ occupation"$

How Does It Happen

NMF breaks down the multivariate data by creating a user-defined number of features. Each one of these features is a combination of the original attribute set. It is also key to remember these coefficients of these linear combinations are non-negative.

Another way to think about it is that NMF breaks your original data features (let's call it V) into the product of two lower ranked matrices (let's call it W and H). NMF uses an iterative approach to modify the initial values of W and H so that the product approaches V. When the approximation error converges or the user-defined number of iterations is reached, NMF terminates.

NMF data preparation

  • Numeric attributes are normalized.
  • Missing numerical values are replaced with the mean.
  • Missing categorical values are replaced with the mode.

It is important to note that outliers can impact NMF significantly. In practice, most Data Scientist use a clipping transformation before binning or normalizing. In addition, NMF in many cases will benefit from normalization.

As in many other algorithmic cases, to improve matrix factorization, one needs to decrease the error tolerance (which will increase compute time).

Society of Data Scientists

Posted 2016-02-18T04:25:38.627

Reputation: 560

1It is probably good to be clear that NMF can also be applied to other problems besides text analysis. (For instance, in the paper by Lee and Seung they also use NMF to learn parts of facial images.) – Tim Goodman – 2016-12-01T14:53:49.573