Let me start with PCA. Suppose that you have n data points comprised of d numbers (or dimensions) each. If you center this data (subtract the mean data point $\mu$ from each data vector $x_i$) you can stack the data to make a matrix

$$
X = \left(
\begin{array}{ccccc}
&& x_1^T - \mu^T && \\
\hline
&& x_2^T - \mu^T && \\
\hline
&& \vdots && \\
\hline
&& x_n^T - \mu^T &&
\end{array}
\right)\,.
$$

The covariance matrix

$$
S = \frac{1}{n-1} \sum_{i=1}^n (x_i-\mu)(x_i-\mu)^T = \frac{1}{n-1} X^T X
$$

measures to which degree the different coordinates in which your data is given vary together. So, it's maybe not surprising that PCA -- which is designed to capture the variation of your data -- can be given in terms of the covariance matrix. In particular, the eigenvalue decomposition of $S$ turns out to be

$$
S = V \Lambda V^T = \sum_{i = 1}^r \lambda_i v_i v_i^T \,,
$$

where $v_i$ is the $i$-th *Principal Component*, or PC, and $\lambda_i$ is the $i$-th eigenvalue of $S$ and is also equal to the variance of the data along the $i$-th PC. This decomposition comes from a general theorem in linear algebra, and some work *does* have to be done to motivate the relatino to PCA.

SVD is a general way to understand a matrix in terms of its column-space and row-space. (It's a way to rewrite any matrix in terms of other matrices with an intuitive relation to the row and column space.) For example, for the matrix $A = \left( \begin{array}{cc}1&2\\0&1\end{array} \right)$ we can find directions $u_i$ and $v_i$ in the domain and range so that

You can find these by considering how $A$ as a linear transformation morphs a unit sphere $\mathbb S$ in its domain to an ellipse: the principal semi-axes of the ellipse align with the $u_i$ and the $v_i$ are their preimages.

In any case, for the data matrix $X$ above (really, just set $A = X$), SVD lets us write

$$
X = \sum_{i=1}^r \sigma_i u_i v_j^T\,,
$$

where $\{ u_i \}$ and $\{ v_i \}$ are orthonormal sets of vectors.A comparison with the eigenvalue decomposition of $S$ reveals that the "right singular vectors" $v_i$ are equal to the PCs, the "right singular vectors" are

$$
u_i = \frac{1}{\sqrt{(n-1)\lambda_i}} Xv_i\,,
$$

and the "singular values" $\sigma_i$ are related to the data matrix via

$$
\sigma_i^2 = (n-1) \lambda_i\,.
$$

It's a general fact that the right singular vectors $u_i$ span the column space of $X$. In this specific case, $u_i$ give us a scaled projection of the data $X$ onto the direction of the $i$-th principal component. The left singular vectors $v_i$ in general span the row space of $X$, which gives us a set of orthonormal vectors that spans the data much like PCs.

I go into some more details and benefits of the relationship between PCA and SVD in this longer article.

6

I wrote this FAQ-style question together with my own answer, because it is frequently being asked in various forms, but there is no canonical thread and so closing duplicates is difficult. Please provide meta comments in this accompanying meta thread.

– amoeba – 2015-01-22T11:25:16.0432http://stats.stackexchange.com/questions/177102/what-is-the-intuition-behind-svd/179042#179042 – kjetil b halvorsen – 2016-02-03T10:45:12.737

2

In addition to an excellent and detailed amoeba's answer with its further links I might recommend to check this, where PCA is considered side by side some other SVD-based techniques. The discussion there presents algebra almost identical to amoeba's with just minor difference that the speech there, in describing PCA, goes about svd decomposition of $\mathbf X/\sqrt{n}$ [or $\mathbf X/\sqrt{n-1}$] instead of $\bf X$ - which is simply convenient as it relates to the PCA done via the eigendecomposition of the covariance matrix.

– ttnphns – 2016-02-03T12:18:03.563PCA is a special case of SVD. PCA needs the data normalized, ideally same unit. The matrix is nxn in PCA. – Orvar Korvar – 2017-10-17T09:12:04.377