After the excellent post by JD Long in this thread, I looked for a simple example, and the R code necessary to produce the PCA and then go back to the original data. It gave me some first-hand geometric intuition, and I want to share what I got. The dataset and code can be directly copied and pasted into R form Github.

I used a data set that I found online on semiconductors here, and I trimmed it to just two dimensions - "atomic number" and "melting point" - to facilitate plotting.

As a caveat the idea is purely illustrative of the computational process: PCA is used to reduce more than two variables to a few derived principal components, or to identify collinearity also in the case of multiple features. So it wouldn't find much application in the case of two variables, nor would there be a need to calculate eigenvectors of correlation matrices as pointed out by @amoeba.

Further, I truncated the observations from 44 to 15 to ease the task of tracking individual points. The ultimate result was a skeleton data frame (`dat1`

):

```
compounds atomic.no melting.point
AIN 10 498.0
AIP 14 625.0
AIAs 23 1011.5
... ... ...
```

The "compounds" column indicate the chemical constitution of the semiconductor, and plays the role of row name.

This can be reproduced as follows (ready to copy and paste on R console):

```
dat <- read.csv(url("http://rinterested.github.io/datasets/semiconductors"))
colnames(dat)[2] <- "atomic.no"
dat1 <- subset(dat[1:15,1:3])
row.names(dat1) <- dat1$compounds
dat1 <- dat1[,-1]
```

The data were then scaled:

```
X <- apply(dat1, 2, function(x) (x - mean(x)) / sd(x))
# This centers data points around the mean and standardizes by dividing by SD.
# It is the equivalent to `X <- scale(dat1, center = T, scale = T)`
```

The linear algebra steps followed:

```
C <- cov(X) # Covariance matrix (centered data)
```

$\begin{bmatrix}
&\text{at_no}&\text{melt_p}\\
\text{at_no}&1&0.296\\
\text{melt_p}&0.296&1
\end{bmatrix}$

The correlation function `cor(dat1)`

gives the same output on the non-scaled data as the function `cov(X)`

on the scaled data.

```
lambda <- eigen(C)$values # Eigenvalues
lambda_matrix <- diag(2)*eigen(C)$values # Eigenvalues matrix
```

$\begin{bmatrix}
&\color{purple}{\lambda_{\text{PC1}}}&\color{orange}{\lambda_{\text{PC2}}}\\
&1.296422& 0\\
&0&0.7035783
\end{bmatrix}$

```
e_vectors <- eigen(C)$vectors # Eigenvectors
```

$\frac{1}{\sqrt{2}}\begin{bmatrix}
&\color{purple}{\text{PC1}}&\color{orange}{\text{PC2}}\\
&1&\,\,\,\,\,1\\
&1&-1
\end{bmatrix}$

Since the first eigenvector initially returns as $\sim \small [-0.7,-0.7]$ we choose to change it to $\small [0.7, 0.7]$ to make it consistent with built-in formulas through:

```
e_vectors[,1] = - e_vectors[,1]; colnames(e_vectors) <- c("PC1","PC2")
```

The resultant eigenvalues were $\small 1.2964217$ and $\small 0.7035783$. Under less minimalistic conditions, this result would have helped decide which eigenvectors to include (largest eigenvalues). For instance, the relative contribution of the first eigenvalue is $\small 64.8\%$: `eigen(C)$values[1]/sum(eigen(C)$values) * 100`

, meaning that it accounts for $\sim\small 65\%$ of the variability in the data. The variability in the direction of the second eigenvector is $35.2\%$. This is typically shown on a scree plot depicting the value of the eigenvalues:

We'll include both eigenvectors given the small size of this toy data set example, understanding that excluding one of the eigenvectors would result in dimensionality reduction - the idea behind PCA.

The **score matrix** was determined as the matrix multiplication of the **scaled data** (`X`

) by the **matrix of eigenvectors (or "rotations")**:

```
score_matrix <- X %*% e_vectors
# Identical to the often found operation: t(t(e_vectors) %*% t(X))
```

The concept entails a linear *combination of each entry* (row / subject / observation / superconductor in this case) of the centered (and in this case scaled) data weighted by the *rows of each eigenvector*, so that in each of the final columns of the score matrix, we'll find a contribution from each variable (column) of the data (the entire `X`

), BUT only the corresponding eigenvector will have taken part in the computation (i.e. the first eigenvector $[0.7, 0.7]^{T}$ will contribute to $\text{PC}\,1$ (Principal Component 1) and $[0.7, -0.7]^{T}$ to $\text{PC}\,2$, as in:

Therefore each eigenvector will influence each variable differently, and this will be reflected in the "loadings" of the PCA. In our case, the negative sign in the second component of the second eigenvector $[0.7, - 0.7]$ will change the sign of the melting point values in the linear combinations that produce PC2, whereas the effect of the first eigenvector will be consistently positive:

The eigenvectors are scaled to $1$:

```
> apply(e_vectors, 2, function(x) sum(x^2))
PC1 PC2
1 1
```

whereas the (**loadings**) are the eigenvectors scaled by the eigenvalues (despite the confusing terminology in the in-built R functions displayed below). Consequently, the loadings can be calculated as:

```
> e_vectors %*% lambda_matrix
[,1] [,2]
[1,] 0.9167086 0.497505
[2,] 0.9167086 -0.497505
> prcomp(X)$rotation %*% diag(princomp(covmat = C)$sd^2)
[,1] [,2]
atomic.no 0.9167086 0.497505
melting.point 0.9167086 -0.497505
```

It is interesting to note that the rotated data cloud (the score plot) will have variance along each component (PC) equal to the eigenvalues:

```
> apply(score_matrix, 2, function(x) var(x))
PC1 PC2
53829.7896 110.8414
> lambda
[1] 53829.7896 110.8414
```

Utilizing the built-in functions the results can be replicated:

```
# For the SCORE MATRIX:
prcomp(X)$x
# or...
princomp(X)$scores # The signs of the PC 1 column will be reversed.
# and for EIGENVECTOR MATRIX:
prcomp(X)$rotation
# or...
princomp(X)$loadings
# and for EIGENVALUES:
prcomp(X)$sdev^2
# or...
princomp(covmat = C)$sd^2
```

Alternatively, the singular value decomposition ($\text{U}\Sigma \text{V}^\text{T}$) method can be applied to manually calculate PCA; in fact, this is the method used in `prcomp()`

. The steps can be spelled out as:

```
svd_scaled_dat <-svd(scale(dat1))
eigen_vectors <- svd_scaled_dat$v
eigen_values <- (svd_scaled_dat$d/sqrt(nrow(dat1) - 1))^2
scores<-scale(dat1) %*% eigen_vectors
```

The result is shown below, with first, the distances from the individual points to the first eigenvector, and on a second plot, the orthogonal distances to the second eigenvector:

If instead we plotted the values of the score matrix (PC1 and PC2) - no longer "melting.point" and "atomic.no", but really a change of basis of the point coordinates with the eigenvectors as basis, these distances would be preserved, but would naturally become perpendicular to the xy axis:

The trick was now to **recover the original data**. The points had been transformed through a simple matrix multiplication by the eigenvectors. Now the data was rotated back by multiplying by the **inverse of the matrix of eigenvectors** with a resultant marked change in the location of the data points. For instance, notice the change in pink dot "GaN" in the left upper quadrant (black circle in the left plot, below), returning to its initial position in the left lower quadrant (black circle in the right plot, below).

Now we finally had the original data restored in this "de-rotated" matrix:

Beyond the change of coordinates of rotation of the data in PCA, the results must be interpreted, and this process tends to involve a `biplot`

, on which the data points are plotted with respect to the new eigenvector coordinates, and the original variables are now superimposed as vectors. It is interesting to note the equivalence in the position of the points between the plots in the second row of rotation graphs above ("Scores with xy Axis = Eigenvectors") (to the left in the plots that follow), and the `biplot`

(to the right):

The superimposition of the original variables as red arrows offers a path to the interpretation of `PC1`

as a vector in the direction (or with a positive correlation) with both `atomic no`

and `melting point`

; and of `PC2`

as a component along increasing values of `atomic no`

but negatively correlated with `melting point`

, consistent with the values of the eigenvectors:

```
PCA$rotation
PC1 PC2
atomic.no 0.7071068 0.7071068
melting.point 0.7071068 -0.7071068
```

This interactive tutorial by Victor Powell gives immediate feedback as to the changes in the eigenvectors as the data cloud is modified.

1

Here is the link to "Analysing ecological data" by Alain F. Zuur, Elena N. Ieno, Graham M. Smith, where the example with the overhead-projector and the hand is given: http://books.google.de/books?id=mmPvf-l7xFEC&lpg=PA15&ots=b_5iizOr3p&dq=Zuur%20et%20al%20in%20Analyzing%20ecological%20data&hl=en&pg=PA194#v=onepage&q&f=false

– vonjd – 2010-10-26T06:25:04.1472

A two pages article explaining PCA for biologists: Ringnér. What is principal component analysis?. Nature Biotechnology 26, 303-304 (2008)

– Borlaug – 2011-04-13T01:55:06.840I had imagined a lengthy demo with a bunch of graphs and explanations when I stumbled across this.

– None – 2010-09-16T02:18:11.8834Similar to explanation by Zuur et al in Analyzing ecological data where they talk about projecting your hand on an overhead projector. You keep rotating your hand so that the projection on the wall looks pretty similar to what you think a hand should look like. – Roman Luštrik – 2010-09-16T09:00:49.823

7This question lead me to a good paper, and even though I think that is a great quote it is not from Einstein. This is a common misattribution, and the more likely original quote is probably this one from Ernest Rutherford who said, "If you can't explain your physics to a barmaid it is probably not very good physics." All the same thanks for starting this thread. – gavaletz – 2013-04-15T15:03:37.020

15Alice Calaprice,

The ultimate quotable Einstein, Princeton U.P. 2011 flags the quotation here as one of many "Probably not by Einstein". See p.482. – Nick Cox – 2013-06-20T10:01:41.333A link to a geometrical account of PCA vs regression vs canonical correlation.

– ttnphns – 2013-07-29T20:43:09.140Here is another intuitive explanation for PCA: A layman's introduction to principal component analysis (in 100 seconds)

– James LI – 2013-11-29T15:58:14.870Explanation why PCs maximize variance and why they are orthogonal: http://stats.stackexchange.com/a/110546/3277. And what is "variance" in PCA: http://stats.stackexchange.com/a/22571/3277.

– ttnphns – 2014-10-05T17:44:51.0471

Check this link out:- http://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/ Great explanation for the PCA!

– pj123 – 2014-11-08T20:36:53.2831I can't explain anything to my grandmother, because she's dead. Does this mean I don't understand anything?! It might be more fun explaining things to a barmaid anyway though... – A. Donda – 2015-07-17T16:01:05.790

3

This small video gives abstract idea about PCA https://www.youtube.com/watch?v=BfTMmoDFXyE

– Aniket gurav – 2015-08-26T08:23:13.53780Good question. I agree with the quote as well. I believe there are many people in statistics and mathematics who are highly intelligent, and can get very deep into their work, but don't deeply understand what they are working on. Or they do, but are incapable of explaining it to others.I go out of my way to provide answers here in plain English, and ask questions demanding plan English answers. – Neil McGuigan – 2010-09-15T21:43:29.863

7

This was asked on the Mathematics site in July, but not as well and it didn't get many answers (not surprising, given the different focus there). http://math.stackexchange.com/questions/1146/intuitive-way-to-understand-principal-component-analysis

– whuber – 2010-09-16T05:03:44.287Interesting quote, considering that Einstein's mother urged him several times to explain her general relativity in a way she could understand (and he tried, without success, a number of times). – motobói – 2016-06-26T14:39:07.083

1I think PCA is a hype. You can't find the meaning in data unless you already know dimensions of the data before you start. You're probably getting wrapped up in the massive hype from the industrial internet sector regarding AI and it just doesn't hold water. – Marcos – 2017-07-31T21:56:13.617

This is a random-hypothetical question. The OP does not reveal that anything had been done to solve the problem (question). Hasn't this been milked before? Moderators are not doing their job! Total herd mentality! – wrtsvkrfm – 2018-02-15T06:06:56.003