## What are the differences between Factor Analysis and Principal Component Analysis?

170

131

It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be incredibly useful.

3

– ttnphns – 2014-04-24T22:36:08.343

2

The principal components analysis and factor analysis chapters in the following book, which is available in most college libraries, address your question exactly: http://www.apa.org/pubs/books/4316510.aspx

– user31256 – 2013-10-09T07:11:33.983

2

And another good question like "should I use PCA or FA": http://stats.stackexchange.com/q/123063/3277.

– ttnphns – 2014-11-07T14:21:40.223

Explanation of why factor scores are inexact while component scores are true: http://stats.stackexchange.com/q/127483/3277.

– ttnphns – 2015-01-04T11:13:39.743

Personally, I like the analogy of PCA = formative and FA = reflective, see https://stats.stackexchange.com/q/279062/27276. But probably not all share that view.

– hplieninger – 2017-07-28T10:58:21.747

128

Principal component analysis involves extracting linear composites of observed variables.

Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.

In psychology these two techniques are often applied in the construction of multi-scale tests to determine which items load on which scales. They typically yield similar substantive conclusions (for a discussion see Comrey (1988) Factor-Analytic Methods of Scale Development in Personality and Clinical Psychology). This helps to explain why some statistics packages seem to bundle them together. I have also seen situations where "principal component analysis" is incorrectly labelled "factor analysis".

In terms of a simple rule of thumb, I'd suggest that you:

1. Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.

2. Run principal component analysis If you want to simply reduce your correlated observed variables to a smaller set of important independent composite variables.

4The rule of thumb there is highly useful. Thanks for that. – Brandon Bertelsen – 2010-08-13T03:38:38.353

1Regarding rule of thumb (1): Wouldn't I test a theoretical model of latent factors with a confirmatory factor analysis rather than an exploratory fa? – Roman – 2014-05-14T10:34:14.637

1@roman Yes. A CFA gives you much more control over the model than EFA. E.g., you can constrain loadings to zero; equate loadings; have correlated residuals; add higher order factors; etc. – Jeromy Anglim – 2014-05-14T23:55:18.857

1@Jeromy Anglim Is it really correct to say PCA makes a "smaller set of important independent composite variables." Or should you really say "smaller set of important uncorrelated composite variables". If the underlying data being used in PCA is not (multivariate) normally distributed, the reduced dimensional data will only be uncorrelated? – FXQuantTrader – 2016-10-26T19:01:08.813

43

From my response here:

Is PCA followed by a rotation (such as varimax) still PCA?

Principal Component Analysis (PCA) and Common Factor Analysis (CFA) are distinct methods. Often, they produce similar results and PCA is used as the default extraction method in the SPSS Factor Analysis routines. This undoubtedly results in a lot of confusion about the distinction between the two.

The bottom line is that these are two different models, conceptually. In PCA, the components are actual orthogonal linear combinations that maximize the total variance. In FA, the factors are linear combinations that maximize the shared portion of the variance--underlying "latent constructs". That's why FA is often called "common factor analysis". FA uses a variety of optimization routines and the result, unlike PCA, depends on the optimization routine used and starting points for those routines. Simply there is not a single unique solution.

In R, the factanal() function provides CFA with a maximum likelihood extraction. So, you shouldn't expect it to reproduce an SPSS result which is based on a PCA extraction. It's simply not the same model or logic. I'm not sure if you would get the same result if you used SPSS's Maximum Likelihood extraction either as they may not use the same algorithm.

For better or for worse in R, you can, however, reproduce the mixed up "factor analysis" that SPSS provides as its default. Here's the process in R. With this code, I'm able to reproduce the SPSS Principal Component "Factor Analysis" result using this dataset. (With the exception of the sign, which is indeterminate). That result could also then be rotated using any of R's available rotation methods.

data(attitude)
# Compute eigenvalues and eigenvectors of the correlation matrix.
pfa.eigen <- eigen(cor(attitude))
# Print and note that eigenvalues are those produced by SPSS.
# Also note that SPSS will extract 2 components as eigenvalues > 1 = 2.
pfa.eigen$values # Set a value for the number of factors (for clarity) kFactors <- 2 # Extract and transform two components. pfa.eigen$vectors[, seq_len(kFactors)]  %*%
diag(sqrt(pfa.eigen$values[seq_len(kFactors)]), kFactors, kFactors)  1factanal() provides EFA not CFA. Also, from my experience, SPSS's Maximum Likelihood extraction should give the same result as factanal() given that there is no oblique rotation. – kitman0804 – 2014-12-16T05:55:49.283 1What does the following mean: 'In FA, the factors are linear combinations that maximize the shared portion of the variance--underlying "latent constructs". '? – conjectures – 2015-05-31T14:36:13.773 5Note that you will get the same results with principal(attitude, 2, rotate="none") from the psych package and that Kayser's rule (ev > 1) is not the most recommended way to test for dimensionality (it overestimates the number of factors). – chl – 2010-10-07T06:36:59.833 5Yes, I know psych principal wraps this up. My purpose was to show what SPSS "factor analysis" was doing when using the principal components extraction method. I agree that the eigenvalue rule is a poor way to select the number of factors. But, that is exactly what SPSS does by default and this was what I was demonstrating. – Brett – 2010-10-07T14:21:18.797 Note also that CFA may stand for confirmatory FA (as opposed to explanatory FA) instead of common FA. – Richard Hardy – 2017-07-05T19:51:57.757 31 You are right about your first point, although in FA you generally work with both (uniqueness and communality). The choice between PCA and FA is a long-standing debate among psychometricians. I don't quite follow your points, though. Rotation of principal axes can be applied whatever the method is used to construct latent factors. In fact, most of the times this is the VARIMAX rotation (orthogonal rotation, considering uncorrelated factors) that is used, for practical reasons (easiest interpretation, easiest scoring rules or interpretation of factor scores, etc.), although oblique rotation (e.g. PROMAX) might probably better reflect the reality (latent constructs are often correlated with each other), at least in the tradition of FA where you assume that a latent construct is really at the heart of the observed inter-correlations between your variables. The point is that PCA followed by VARIMAX rotation somewhat distorts the interpretation of the linear combinations of the original variables in the "data analysis" tradition (see the work of Michel Tenenhaus). From a psychometrical perspectice, FA models are to be preferred since they explicitly account for measurement errors, while PCA doesn't care about that. Briefly stated, using PCA you are expressing each component (factor) as a linear combination of the variables, whereas in FA these are the variables that are expressed as linear combinations of the factors (including communalities and uniqueness components, as you said). I recommend you to read first the following discussions about this topic: +1 for the point about errors being explicitly modeled. – D L Dahly – 2014-01-28T13:44:46.820 1PCA followed by VARIMAX rotation somewhat distorts the interpretation of the linear combinations of the original variables in the "data analysis" tradition. Chl, could you explicate it? That's interesting. – ttnphns – 2014-11-09T11:01:49.000 6Ah, I was wondering why you linked to this queston, in this question... :) – Brandon Bertelsen – 2011-09-05T20:51:33.057 7 Just to say that my answer may actually look a little bit off-topic since this question has been merged with another one, http://stats.stackexchange.com/questions/3369/difference-between-fa-and-pca (I initially answer to the latter). – chl – 2010-10-26T09:16:23.327 29 There are numerous suggested definitions on the web. Here is one from a on-line glossary on statistical learning: Principal Component Analysis Constructing new features which are the principal components of a data set. The principal components are random variables of maximal variance constructed from linear combinations of the input features. Equivalently, they are the projections onto the principal component axes, which are lines that minimize the average squared distance to each point in the data set. To ensure uniqueness, all of the principal component axes must be orthogonal. PCA is a maximum-likelihood technique for linear regression in the presence of Gaussian noise on both inputs and outputs. In some cases, PCA corresponds to a Fourier transform, such as the DCT used in JPEG image compression. See "Eigenfaces for recognition" (Turk&Pentland, J Cognitive Neuroscience 3(1), 1991), Bishop, "Probabilistic Principal Component Analysis", and "Automatic choice of dimensionality for PCA".choice of dimensionality for PCA". Factor analysis A generalization of PCA which is based explicitly on maximum-likelihood. Like PCA, each data point is assumed to arise from sampling a point in a subspace and then perturbing it with full-dimensional Gaussian noise. The difference is that factor analysis allows the noise to have an arbitrary diagonal covariance matrix, while PCA assumes the noise is spherical. In addition to estimating the subspace, factor analysis estimates the noise covariance matrix. See "The EM Algorithm for Mixtures of Factor Analyzers".choice of dimensionality for PCA". 1The Factor Analysis description gets the main point (diagonal covariance), but historically was not developed as a generalisation of PCA. – conjectures – 2017-07-04T14:03:03.893 1So basically, in PCA one svd's the covariance matrix and in FA the correlation matrix? Its always hard for me to find the actual math after methods have built up a lot of terminology from the field where they are applied. (off-topic: it once took me a whole afternoon understanding what the path modeling is until I found one (1) paper from the 70's that stated the matrix equation behind it.) – Mark van der Loo – 2017-10-16T14:23:21.597 21 The top answer in this thread suggests that PCA is more of a dimensionality reduction technique, whereas FA is more of a latent variable technique. This is sensu stricto correct. But many answers here and many treatments elsewhere present PCA and FA as two completely different methods, with dissimilar if not opposite goals, methods and outcomes. I disagree; I believe that when PCA is taken to be a latent variable technique, it is quite close to FA, and they should better be seen as very similar methods. I provided my own account of the similarities and differences between PCA and FA in the following thread: Is there any good reason to use PCA instead of EFA? Also, can PCA be a substitute for factor analysis? There I argue that for simple mathematical reasons the outcome of PCA and FA can be expected to be quite similar, given only that the number of variables is not very small (perhaps over a dozen). See my [long!] answer in the linked thread for mathematical details and Monte Carlo simulations. For a much more concise version of my argument see here: Under which conditions do PCA and FA yield similar results? Here I would like to show it on an example. I will analyze the wine dataset from UCI Machine Learning Repository. It is a fairly well-known dataset with$n=178$wines from three different grapes described by$p=13$variables. Here is how the correlation matrix looks like: I ran both PCA and FA analysis and show 2D projections of the data as biplots for both of them on the figure below (PCA on the left, FA on the right). Horizontal and vertical axes show 1st and 2nd component/factor scores. Each of the$n=178$dots corresponds to one wine, and dots are colored according to the group (see legend): The loadings of the 1st and 2nd component/factor onto the each of the$p=13$original variables are shown as black lines. They are equal to correlations between each of the original variables and the two components/factors. Of course correlations cannot exceed$1$, so all loading lines are contained inside of the "correlation circle" showing maximal possible correlation. All loadings and the circle are arbitrarily scaled by a factor of$3$, otherwise they would be too small to be seen (so the radius of the circle is$3$and not$1$). Note that there is hardly any difference between PCA and FA! There are small deviations here and there, but the general picture is almost identical, and all the loadings are very similar and point in the same directions. This is exactly what was expected from the theory and is no surprise; still, it is instructive to observe. PS. For a much prettier PCA biplot of the same dataset, see this answer by @vqv. PPS. Whereas PCA calculations are standard, FA calculations might require a comment. Factor loadings were computed by an "iterated principal factors" algorithm until convergence (9 iterations), with communalities initialized with partial correlations. Once the loadings converged, the scores were calculated using Bartlett's method. This yields standardized scores; I scaled them up by the respective factor variances (given by loadings lengths). 1Which software did you use to create the PCA and factor analysis plots? – rnso – 2015-03-30T17:36:34.320 1I used Matlab. I was thinking of pasting the code into my answer (as is normally my habit), but did not want to clutter this busy thread even more. But come to think of it, I should post it on some external website and leave a link here. I will do that. – amoeba – 2015-03-30T22:21:14.103 2 It is true that PCA and FA sometimes and not at all seldom give similar results (loadings), and so PCA can be seen as a specific case of FA, when factor analysis is defined broadly. Still FA (sensu stricto) and PCA are theoretically quite different. – ttnphns – 2015-12-03T04:25:29.597 2 (cont.) Factors are transcendent latent traits; pr. components are immanent derivations. Despite your two loading plots appear practically similar, theoretically they are fundamentally different. The components plane on the left was produced as a subspace of the variables which project themselves on it. The factor plane was produced as a space different from the space of the variables, and so they project themselves on an "alien" space on the right plot. – ttnphns – 2015-12-03T04:27:31.543 1(cont.) A subtle and insidious nuance which potentially misleads a viewer of your two plots arises from the fact that both are drawn by you as a biplot, not simply a plot of loadings. When we speak of PCA (left pic), the biplot is justified, because both variable loadings and object scores are embedded in the same analytic space - the space of principal axes which are in turn the subspace of the space spanned by the variables. – ttnphns – 2015-12-03T17:47:21.243 2(cont.) But the right pic (FA) is actually not a true biplot, it is rather an overlay of two distinct scatterplots, different spaces: the loading plot (where axes are true factors) and the object scores plot (where axes are the estimated factors as scores). True factor space overruns the "parental" variable space but factor scores space is its subspace. You superimposed two heterogeneous pairs of axes, but they bear the same labels ("factor1" and "factor2" in both pairs) which circumstance is strongly misleading and persuades us to think that is a bona fide biplot, like the left one. – ttnphns – 2015-12-03T17:47:40.107 1(cont.) Of course, one is in full right to draw "biplots" in FA like the right one of yours. It is not a mistake. Rather one should just keep in mind that on such a plot we blend two different types of axes: factors as true (which load) and factors as estimated scores. – ttnphns – 2015-12-03T17:53:40.837 @ttnphns Thanks for these comments. This answer of mine requires some additional work but I keep postponing it... Whenever I get to edit it, I will take your points into consideration. – amoeba – 2017-03-08T14:28:24.153 19 Differences between factor analysis and principal component analysis are: • In factor analysis there is a structured model and some assumptions. In this respect it is a statistical technique which does not apply to principal component analysis which is a purely mathematical transformation. • The aim of principal component analysis is to explain the variance while factor analysis explains the covariance between the variables. One of the biggest reasons for the confusion between the two has to do with the fact that one of the factor extraction methods in Factor Analysis is called "method of principal components". However, it's one thing to use PCA and another thing to use the method of principal components in FA. The names may be similar, but there are significant differences. The former is an independent analytical method while the latter is merely a tool for factor extraction. 11 A basic, yet a kind of painstaking, explanation of PCA vs Factor analysis with the help of scatterplots, in logical steps. (I thank @amoeba who, in his comment to the question, has encouraged me to post an answer in place of making links to elsewhere. So here is a leisure, late response.) ### PCA as variable summarization (feature extraction) Hope you already have understanding of PCA. To revive now. Suppose we have correlating variables$V_1$and$V_2$. We center them (subtract the mean) and do a scatterplot. Then we perform PCA on these centered data. PCA is a form of axes rotation which offers axes P1 and P2 instead of V1 and V2. The key property of PCA is that P1 - called 1st principal component - gets oriented so that the variance of data points along it is maximized. The new axes are new variables which values are computable as long as we know the coefficients of rotation$a$(PCA provides them) [Eq.1]:$P1 = a1_1V_1 + a1_2V_2P2 = a2_1V_1 + a2_2V_2$Those coefficients are cosines of rotation (= direction cosines, principal directions) and comprise what are called eigenvectors, while eigenvalues of the covariance matrix are the principal component variances. In PCA, we typically discard weak last components: we thus summarize data by few first extracted components, with little information loss. Covariances V1 V2 V1 1.07652 .73915 V2 .73915 .95534 ----PCA---- Eigenvalues % P1 1.75756 86.500 P2 .27430 13.500 Eigenvectors P1 P2 V1 .73543 -.67761 V2 .67761 .73543  With our plotted data, P1 component values (scores) P1 = .73543*V1 + .67761*V2 and component P2 we discard. P1's variance is 1.75756, the 1st eigenvalue of the covariance matrix, and so P1 explains 86.5% of the total variance which equals (1.07652+.95534) = (1.75756+.27430). ### PCA as variable prediction ("latent" feature) So, we discarded P2 and expect that P1 alone can reasonably represent the data. That is equivalent to say that$P1$can reasonably well "reconstruct" or predict$V_1$and$V_2$[Eq.2]:$V_1 = a1_{1}P1 + E_1V_2 = a1_{2}P1 + E_2$where coefficients$a$are what we already know and$E$are the errors (unpredictedness). This is actually a "regressional model" where observed variables are predicted (back) by the latent variable (if to allow calling a component a "latent" one) P1 extracted from those same variables. Look at the plot Fig.2, it is nothing else than Fig.1, only detailed: P1 axis is shown tiled with its values (P1 scores) in green (these values are the projections of data points onto P1). Some arbitrary data points were labeled A, B,..., and their departure (error) from P1 are bold black connectors. For point A, details are shown: the coordinates of the P1 score (green A) onto V1 and V2 axes are the P1-reconstructed values of V1 and V2 according to Eq.2,$\hat{V_1} = a1_{1}P1$and$\hat{V_2} = a1_{2}P1$. The reconstruction errors$E_1 = V_1-\hat{V_1}$and$E_2 = V_2-\hat{V_2}$are also displayed, in beige. The connector "error" length squared is sum of the two errors squared, according to Pythagorean. Now, what is characteristic of PCA is that if we compute E1 and E2 for every point in the data and plot these coordinates - i.e. make the scatterplot of the errors alone, the cloud "error data" will coincide with the discarded component P2. And it does: the cloud is plotted on the same picture as the beige cloud - and you see it actually forms axis P2 (of Fig.1) as tiled with P2 component scores. No wonder, you may say. It is so obvious: in PCA, the discarded junior component(s) is what precisely decompose(s) in the prediction errors E, in the model which explains (restores) original variables V by the latent feature(s) P1. Errors E together just constitute the left out component(s). Here is where factor analysis starts to differ from PCA. ### The idea of common FA (latent feature) Formally, the model predicting manifest variables by the extracted latent feature(s) is the same in FA as in PCA; [Eq.3]:$V_1 = a_{1}F + E_1V_2 = a_{2}F + E_2$where F is the latent common factor extracted from the data and replacing what was P1 in Eq.2. The difference in the model is that in FA, unlike PCA, error variables (E1 and E2) are required to be uncorrelated with each other. Digression. Here I want suddenly to interrupt the story and make a notion on what are coefficients$a$. In PCA, we said, these were entries of eigenvectors found within PCA (via eigen- or singular-value-decomposition). While latent P1 had its native variance. If we choose to standardize P1 to unit variance we'll have to compensate by appropriately scaling up coefficients$a$, in order to support the equation. That scaled up$a$s are called loadings; they are of interest numerically because they are the covariances (or correlations) between the latent and the observable variables and therefore can help interpret the latent feature. In both models - Eq.2 and Eq.3 - you are free to decide, without harming the equation, which way the terms are scaled. If F (or P1) is considered unit scaled,$a$is loading; while if F (P1) has to have its native scale (variance), then$a$should be de-scaled accordingly - in PCA that will equal eigenvector entries, but in FA they will be different and usually not called "eigenvectors". In most texts on factor analysis, F are assumed unit variance so$a$are loadings. In PCA literature, P1 is typically discussed having its real variance and so$a$are eigenvectors. OK, back to the thread. E1 and E2 are uncorrelated in factor analysis; thus, they should form a cloud of errors either round or elliptic but not diagonally oriented. While in PCA their cloud formed straight line coinciding with diagonally going P2. Both ideas are demonstrated on the pic: Note that errors are round (not diagonally elongated) cloud in FA. Factor (latent) in FA is oriented somewhat different, i.e. it is not right the first principal component which is the "latent" in PCA. On the pic, factor line is strangely conical a bit - it will become clear why in the end. What is the meaning of this difference between PCA and FA? Variables correlated, which is seen in the diagonally elliptical shape of the data cloud. P1 skimmed the maximal variance, so the ellipse is co-directed to P1. Consequently P1 explained by itself the correlation; but it did not explain the existing amount of correlation adequately; it looked to explain variation in data points, not correlatedness. Actually, it over-accounted for the correlation, the result of which was the appearance of the diagonal, correlated cloud of errors which compensate for the over-account. P1 alone cannot explain the strength of correlation/covariation comprehensively. Factor F can do it alone; and the condition when it becomes able to do it is exactly where errors can be forced to be uncorrelated. Since the error cloud is round no correlatedness - positive or negative - has remained after the factor was extracted, hence it is the factor which skimmed it all. As a dimensionality reduction, PCA explains variance but explains correlations imprecisely. FA explains correlations but cannot account (by the common factors) as much data variation as PCA can. Factor(s) in FA account for that portion of variability which is the net correlational portion, called communality; and therefore factors can be interpreted as real yet unobservable forces/features/traits which hide "in" or "behind" the input variables to bring them to correlate. Because they explain correlation well mathematically. Principal components (few first ones) explain it mathematically not as well and so can be called "latent trait" (or such) only at some stretch and tentatively. Multiplication of loadings is what explains (restores) correlation, or correlatedness in the form of covariance - if the analysis was based on covariance matrix (as in out example) rather than correlation matrix. Factor analysis that I did with the data yielded a_1=.87352, a_2=.84528, so product a_1*a_2 = .73837 is almost equal to the covariance .73915. On the other hand, PCA loadings were a1_1=.97497, a1_2=.89832, so a1_1*a1_2 = .87584 overestimates .73915 considerably. Having explained the main theoretical distinction between PCA and FA, let's get back to our data to exemplify the idea. ### FA: approximate solution (factor scores) Below is the scatterplot showing the results of the analysis that we'll provisionally call "sub-optimal factor analysis", Fig.3. A technical detail (you may skip): PAF method used for factor extraction. Factor scores computed by Regression method. Variance of the factor scores on the plot was scaled to the true factor variance (sum of squared loadings).  See departures from Fig.2 of PCA. Beige cloud of the errors isn't round, it is diagonally elliptical, - yet it is evidently much fatter than the thin diagonal line having occured in PCA. Note also that the error connectors (shown for some points) are not parallel anymore (in PCA, they were by definition parallel to P2). Moreover, if you look, for example, at points "F" and "E" which lie mirror symmetrically over the factor's F axis, you'll find, unexpectedly, their corresponding factor scores to be quite different values. In other words, factor scores is not just linearly transformed principal component scores: factor F is found in its own way different from P1 way. And their axes do not fully coincide if shown together on the same plot Fig.4: Apart from that they are a bit differently orienterd, F (as tiled with scores) is shorter, i.e. it accounts for smaller variance than P1 accounts for. As noted earlier, factor accounts only for variability which is responsible for correlatedness of V1 V2, i.e. the portion of total variance that is sufficient to bring the variables from primeval covariance 0 to the factual covariance .73915. ### FA: optimal solution (true factor) An optimal factor solution is when errors are round or non-diagonal elliptic cloud: E1 and E2 are fully uncorrelated. Factor analysis actually returns such an optimal solution. I did not show it on a simple scatterplot like the ones above. Why did I? - for it would have been the most interesting thing, after all. The reason is that it would be impossible to show on a scatterplot adequately enough, even adopting a 3D plot. It is quite an interesting point theoretically. In order to make E1 and E2 completely uncorrelated it appears that all these three variables, F, E1, E2 have to lie not in the space (plane) defined by V1, V2; and the three must be uncorrelated with each other. I believe it is possible to draw such a scatterplot in 5D (and maybe with some gimmick - in 4D), but we live in 3D world, alas. Factor F must be uncorrelated to both E1 and E2 (while they two are uncorrelated too) because F is supposed to be the only (clean) and complete source of correlatedness in the observed data. Factor analysis splits total variance of the p input variables into two uncorrelated (nonoverlapping) parts: communality part (m-dimensional, where m common factors rule) and uniqueness part (p-dimensional, where errors are, also called unique factors, mutually uncorrelated). So pardon for not showing the true factor of our data on a scatterplot here. It could be visualized quite adequately via vectors in "subject space" as done here without showing data points. Above, in the section "The idea of common FA (latent feature)" I displayed factor (axis F) as wedge in order to warn that true factor axis does not lie on the plane V1 V2. That means that - in contrast to principal component P1 - factor F as axis is not a rotation of axis V1 or V2 in their space, and F as variable is not a linear combination of variables V1 and V2. Therefore F is modeled (extracted from variables V1 v2) as if an outer, independent variable, not a derivation of them. Equations like Eq.1 from where PCA begins, are inapplicable to compute true (optimal) factor in factor analysis, whereas formally isomorphic equations Eq.2 and Eq.3 are valid for both analyses. That is, in PCA variables generate components and components back predict variables; in FA factor(s) generate/predict variables, and not back - common factor model conceptually assumes so, even though technically factors are extracted from the observed variables. Not only true factor is not a function of the manifest variables, true factor's values are not uniquely defined. In other words, they are simply unknown. That all is due to the fact that we're in the excessive 5D analytic space and not in our home 2D space of the data. Only good approximations (a number of methods exist) to true factor values, called factor scores, are there for us. Factor scores do lie in the plane V1 V2, like principal component scores are, they are computed as the linear functions of V1, V2, too, and it were they that I plotted in the section "FA: approximate solution (factor scores)". Principal component scores are true component values; factor scores are only reasonable approximation to the indetermined true factor values. ### FA: roundup of the procedure To gather in one small clot what the two previous sections said, and add final strokes. Actually, FA can (if you do it right, and see also data assumptions) find the true factor solution (by "true" I mean here optimal for the data sample). However, various methods of extraction exist (they differ in some secondary constraints they put). The true factor solution is up to loadings$a$only. Thus, loadings are of optimal, true factors. Factor scores - if you need them - are computable out of those loadings in various ways and return approximations to factor values. Thus, "factor solution" displayed by me in section "FA: approximate solution (factor scores)" was based actually on optimal loadings, i.e. on true factors. But the scores were not optimal, by destiny. The scores are computed to be a linear function of the observed variables, like component scores are, so they both could be compared on a scatterplot and I did it in didactic pursuit to show like a gradual pass from the PCA idea towards FA idea. One must be wary when plotting on the same biplot factor loadings with factor scores in the "space of factors", be conscious that loadings pertain to true factors while scores pertain to surrogate factors (see my comments to this answer in this thread). Rotation of factors (loadings) help interpret the latent features. Rotation of loadings can be done also in PCA if you use PCA as if factor analysis (that is, see PCA as variable prediction). PCA tends to converge in results with FA as the number of variables grow (see the extremely rich thread on practical and conceptual similarities and differences between the two methods). See my list of differences between PCA and FA in the end of this answer. Step by step computations of PCA vs FA on iris dataset is found here. There is a considerable number of good links to other participants' answers on the topic outside this thread; I'm sorry I only used few of them in the current answer. +1. It's great that you wrote it up, this thread was definitely lacking an answer from you. I upvoted before reading (which I rarely do), and certainly enjoyed subsequent reading. I might comment more later, but one small nitpick for now: you wrote several times that in FA the error cloud should be "round". But in fact, it could well be elliptical (because uniquenesses for V1 and V2 can have different variances), it just has to have zero correlations. I guess you did not want to confuse readers with this detail. – amoeba – 2017-07-03T22:27:56.583 11 For me (and I hope this is useful) factor analysis is much more useful than PCA. Recently, I had the pleasure of analysing a scale through factor analysis. This scale (although it's widely used in industry) was developed by using PCA, and to my knowledge had never been factor analysed. When I performed the factor analysis (principal axis) I discovered that the communalities for three of the items were less than 30%, which means that over 70% of the items' variance was not being analysed. PCA just transforms the data into a new combination and doesn't care about communalities. My conclusion was that the scale was not a very good one from a psychometric point of view, and I've confirmed this with a different sample. Essentially, if you want to predict using the factors, use PCA, while if you want to understand the latent factors, use Factor Analysis. 9 One can think of a PCA as being like a FA in which the communalities are assumed to equal 1 for all variables. In practice, this means that items that would have relatively low factor loadings in FA due to low communality will have higher loadings in PCA. This is not a desirable feature if the primary purpose of the analysis is to cut item length and clean a battery of items of those with low or equivocal loadings, or to identify concepts that are not well represented in the item pool. 9 Expanding on @StatisticsDocConsulting's answer: the difference in loadings between EFA and PCA is non-trivial with a small number of variables. Here's a simulation function to demonstrate this in R: simtestit=function(Sample.Size=1000,n.Variables=3,n.Factors=1,Iterations=100) {require(psych);X=list();x=matrix(NA,nrow=Sample.Size,ncol=n.Variables) for(i in 1:Iterations){for(i in 1:n.Variables){x[,i]=rnorm(Sample.Size)} X$PCA=append(X$PCA,mean(abs(principal(x,n.Factors)$loadings[,1])))
X$EFA=append(X$EFA,mean(abs(factanal(x,n.Factors)$loadings[,1])))};X}  By default, this function performs 100 Iterations, in each of which it produces random, normally distributed samples (Sample.Size$=1000$) of three variables, and extracts one factor using PCA and ML-EFA. It outputs a list of two Iterations-long vectors composed of the mean magnitudes of the simulated variables' loadings on the unrotated first component from PCA and general factor from EFA, respectively. It allows you to play around with sample size and number of variables and factors to suit your situation, within the limits of the principal() and factanal() functions and your computer. Using this code, I've simulated samples of 3–100 variables with 500 iterations each to produce data: Y=data.frame(n.Variables=3:100,Mean.PCA.Loading=rep(NA,98),Mean.EFA.Loading=rep(NA,98)) for(i in 3:100) {X=simtestit(n.Variables=i,Iterations=500);Y[i-2,2]=mean(X$PCA);Y[i-2,3]=mean(X$EFA)}  ...for a plot of the sensitivity of mean loadings (across variables and iterations) to number of variables: This demonstrates how differently one has to interpret the strength of loadings in PCA vs. EFA. Both depend somewhat on number of variables, but loadings are biased upward much more strongly in PCA. The difference between mean loadings these methods decreases as the number of variables increases, but even with 100 variables, PCA loadings average$.067$higher than EFA loadings in random normal data. However, note that mean loadings will usually be higher in real applications, because one generally uses these methods on more correlated variables. I'm not sure how this might affect the difference of mean loadings. 9 A quote from a really nice textbook (Brown, 2006, pp. 22, emphasis added). PCA = principal components analysis EFA = exploratory factor analysis CFA = confirmatory factor analysis Although related to EFA, principal components analysis (PCA) is frequently miscategorized as an estimation method of common factor analysis. Unlike the estimators discussed in the preceding paragraph (ML, PF), PCA relies on a different set of quantitative methods that are not based on the common factor model. PCA does not differentiate common and unique variance. Rather, PCA aims to account for the variance in the observed measures rather than explain the correlations among them. Thus, PCA is more appropriately used as a data reduction technique to reduce a larger set of measures to a smaller, more manageable number of composite variables to use in subsequent analyses. However, some methodologists have argued that PCA is a reasonable or perhaps superior alternative to EFA, in view of the fact that PCA possesses several desirable statistical properties (e.g., computationally simpler, not susceptible to improper solutions, often produces results similar to those of EFA, ability of PCA to calculate a participant’s score on a principal component whereas the indeterminate nature of EFA complicates such computations). Although debate on this issue continues, Fabrigar et al. (1999) provide several reasons in opposition to the argument for the place of PCA in factor analysis. These authors underscore the situations where EFA and PCA produce dissimilar results; for instance, when communalities are low or when there are only a few indicators of a given factor (cf. Widaman, 1993). Regardless, if the overriding rationale and empirical objectives of an analysis are in accord with the common factor model, then it is conceptually and mathematically inconsistent to conduct PCA; that is, EFA is more appropriate if the stated objective is to reproduce the intercorrelations of a set of indicators with a smaller number of latent dimensions, recognizing the existence of measurement error in the observed measures. Floyd and Widaman (1995) make the related point that estimates based on EFA are more likely to generalize to CFA than are those obtained from PCA in that, unlike PCA, EFA and CFA are based on the common factor model. This is a noteworthy consideration in light of the fact that EFA is often used as a precursor to CFA in scale development and construct validation. A detailed demonstration of the computational differences between PCA and EFA can be found in multivariate and factor analytic textbooks (e.g., Tabachnick & Fidell, 2001). Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press. 3 In a paper by Tipping and Bischop the tight relationship between Probabalistic PCA (PPCA) and Factor analysis is discussed. PPCA is closer to FA than the classic PCA is. The common model is $$\mathbf{y} = \mu + \mathbf{Wx} + \epsilon$$ where$\mathbf{W} \in \mathbb{R}^{p,d}$,$\mathbf{x} \sim \mathcal{N}(\mathbf{0},\mathbf{I})$and$\epsilon \sim \mathcal{N}(\mathbf{0},\mathbf{\Psi})$. • Factor analysis assumes$\mathbf{\Psi}$is diagonal. • PPCA assumes$\mathbf{\Psi} = \sigma^2\mathbf{I}\$

Michael E. Tipping, Christopher M. Bishop (1999). Probabilistic Principal Component Analysis, Journal of the Royal Statistical Society, Volume 61, Issue 3, Pages 611–622

2+1. Yes. I believe that understanding PPCA is necessary to understand the relationship between PCA and FA. But you could improve your answer by discussing the PCA/PPCA relationship. – amoeba – 2017-07-28T07:44:45.133

2

None of these response is perfect. Either FA or PCA has some variants. We must clearly point out which variants are compared. I would compare the maximum likelihood factor analysis and the Hotelling's PCA. The former assume the latent variable follow a normal distribution but PCA has no such an assumption. This has led to differences, such as the solution, the nesting of the components, the unique of the solution, the optimization algorithms.

1I wonder if you could expand a little on this - you have said there are differences in the last sentence, but not given much information about what those differences might be, or in what way those differences might be important? – Silverfish – 2016-09-29T01:02:07.953

1

To select two most distant methods and to claim that they are indeed different - like you do - is not a perfect logic, either. One probably should find and report how these two are similar. Alternatively, one might choose most similar methods (such as plain PCA vs PAF) and report in what way they are different.

– ttnphns – 2016-09-29T02:47:07.187

1Hotelling's PCA does assume latent gaussians. – conjectures – 2017-06-24T09:32:59.970