Nonnegative matrix factorization (NMF) is a popular tool in many applications, such as image and text recognition. If you’ve ever wanted to learn a little bit about NMF, you can do so right here, in this blog post, which will summarize the (slightly) longer presentation here. The R package NMF will be used as illustration.

Given a u x v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices W (u x k) and H (k x v) such that

A â‰ˆ WH

In other words, NMF is a form of dimension reduction.

Note that this means that column j of A is approximately a linear combination of the columns of W, with the coefficients being column j of H. W thus forms what I call a *pseudobasis* for the columns of A.

The larger the rank k, the better our approximation . But we typically hope that a good approximation can be achieved with k << v.

The matrices W and H are calculated iteratively, with one of the major methods being linear regression. Here is how:

We make initial guesses for W and H, say with random numbers. Now consider an odd-numbered iteration. Suppose just for a moment that we know the exact value of W, with H unknown. Then for each j we could “predict” column j of A from the columns of W. The coefficient vector returned by **lm()** will become column j of H. We do this for j = 1,2,…,v.

In even-numbered iterations, suppose we know H but not W. We could take transposes,

A’ â‰ˆ H’ W’

and predict row i of A from the rows of H. Then we alternate until we reach convergence.

CRAN’s **NMF** package for NMF computation is quite versatile, with many, many options. In its simplest form, though, it is quite easy to use. For a matrix **a** and desired rank **k**, we simply run

```
> nout <- nmf(a,k)
```

The factors are then in nout@fit@W and nout@fit@H.

Though NMF is often used for image classification, with input data consisting of many images, here we will have only one image,

and we’ll use NMF to compress it, not do classification. We first obtain A, then call **nmf()**, then plot the product of the factors:

```
> library(pixmap)
> mtr <- read.pnm('MtRush.pgm')
> a <- mtr@grey
> aout <- nmf(a,50)
> w <- aout@fit@W
> h <- aout@fit@H
> approxa <- w %*% h
# brightness values must be in [0,1]
> approxa <- pmin(approxa,1)
> mtrnew <- mtr
> mtrnew@grey <- approxa
> plot(mtrnew)
```

The result is

This is understandably blurry. The original matrix has dimension 194 x 259, and thus presumably has rank 194. (This is confirmed for instance by running the function **rankMatrix()** in the **Matrix** package.) We’ve approximated the matrix by one of rank only 50, with a 75% storage savings. This is not important for one small picture, but possibly worthwhile if we have many large ones. The approximation is not bad in that light, and may be good enough for image recognition or other applications.

Indeed, in many if not most applications of NMF, we need to worry about overfitting, which in this context amounts to using too high a value for our rank, something to be avoided.

Could you post the picture in pnm so that one can run the code as it is? Thank you!

See heather.cs.ucdavis.edu/MtRush.pgm

Thanks great, now I can reproduce your post.

Hi, it does not seem that the rank of the resulting matrix is only 50 as you claim/as one would expect. rankMatrix(mtr@grey) returns 197 as expected, but rankMatrix(mtrnew@grey) returns 105. Am I missing something?

Good point. Here is the resolution: If you find the rank of approxa when it is first created, it is indeed 50. But then I applied the pmin() operation to truncate image brightness to [0,1], and that perturbation greatly increased the rank. If one were to add a small amount of random noise to all of approxa, the rank would go back up to 197. But one still would have achieved dimension reduction, and matrix compression, in a practical sense.

A comment about your explanation of the method: unless I missed something, if you were to simply solve linear systems (using lm() iteratively), lm() would likely report positive AND negative coefficients, and therefore you would not get strictly positive solutions H and W. That would still be a factorization, but that would not be a “non-negative” factorization.

They truncate.