190

174

In simple terms, how would you explain (perhaps with simple examples) the difference between fixed effect, random effect and mixed effect models?

190

174

In simple terms, how would you explain (perhaps with simple examples) the difference between fixed effect, random effect and mixed effect models?

105

Statistician Andrew Gelman says that the terms 'fixed effect' and 'random effect' have variable meanings depending on who uses them. Perhaps you can pick out which one of the 5 definitions applies to your case. In general it may be better to either look for equations which describe the probability model the authors are using (when reading) or write out the full probability model you want to use (when writing).

Here we outline five definitions that we have seen:

Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts $a_i$ and fixed slope $b$ corresponds to parallel lines for different individuals $i$, or the model $y_{it} = a_i + b t$. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients.

Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth.

“When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random.” (Green and Tukey, 1960)

“If an effect is assumed to be a realized value of a random variable, it is called a random effect.” (LaMotte, 1983)

Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage (“linear unbiased prediction” in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics.

[Gelman, 2004, Analysis of variance—why it is more important than ever.

The Annals of Statistics.]

27It is funny that Andrew Gelman is described as a "blogger" rather than as one of the foremost statisticians in the world today. Although he is, of course, a blogger, he probably should be called "Statistician Andrew Gelman" if any qualifier be used. – Brash Equilibrium – 2015-09-23T15:43:11.820

4But as a statistician and not just a fancy blogger he should've put at least subjective relative frequencies of the five cases usage. When people talk about fixed effects vs random effects they most of the times mean: `(4) “If an effect is assumed to be a realized value of a random variable, it is called a random effect.” (LaMotte, 1983)`

– Ufos – 2016-07-19T09:17:45.323

1My impression is that (1) and (5) are by far the most common uses in the social sciences and perhaps some medical fields as well (though I only get the latter impression from reading Gelman's blog on occasion). (4) might be the most defensible use, but especially from Gelman's perspective—he is interested in "applied" statistics a great deal—I would imagine the (4) case does not come up often and it's difficult to say which fields should be most important as we parse frequency of use. – commscho – 2017-02-23T18:15:22.397

11It is also informative to read the Discussion and Rejoinder to this paper. In the discussion, Peter McCullagh wrote that he disagrees with a substantial portion of what Gelman wrote. My point is not to favor one or the other, but to note that there is substantial disagreement among experts and not to put too much weight on one paper. – julieth – 2012-07-22T01:19:58.333

Cool, I haven't seen that. Do you have a link to the paper(s) you're talking about? – John Salvatier – 2012-07-22T06:06:02.183

4+1: very nice link! I guess the definition also varies depending on the field (e.g. #4 is very mathematical/statistical, but #1 and #2 are more "understandable" from a life science point of view) – nico – 2010-11-19T06:39:42.860

156

There are good books on this such as Gelman and Hill. What follows is essentially a summary of their perspective.

First of all, you should not get too caught up in the terminology. In statistics, jargon should never be used as a substitute for a mathematical understanding of the models themselves. That is especially true for random and mixed effects models. "Mixed" just means the model has both fixed and random effects, so let's focus on the difference between fixed and random.

Let's say you have a model with a categorical predictor, which divides your observations into groups according to the category values.* The model coefficients, or "effects", associated to that predictor can be either fixed or random. The most important practical difference between the two is this:

*Random effects are estimated with partial pooling, while fixed effects are not.*

Partial pooling means that, if you have few data points in a group, the group's effect estimate will be based partially on the more abundant data from other groups. This can be a nice compromise between estimating an effect by completely pooling all groups, which masks group-level variation, and estimating an effect for all groups completely separately, which could give poor estimates for low-sample groups.

Random effects are simply the extension of the partial pooling technique as a general-purpose statistical model. This enables principled application of the idea to a wide variety of situations, including multiple predictors, mixed continuous and categorical variables, and complex correlation structures. (But with great power comes great responsibility: the complexity of modeling and inference is substantially increased, and can give rise to subtle biases that require considerable sophistication to avoid.)

To motivate the random effects model, ask yourself: why would you partial pool? Probably because you think the little subgroups are part of some bigger group with a common mean effect. The subgroup means can deviate a bit from the big group mean, but not by an arbitrary amount. To formalize that idea, we posit that the deviations follow a distribution, typically Gaussian. That's where the "random" in random effects comes in: we're assuming the deviations of subgroups from a parent follow the distribution of a random variable. Once you have this idea in mind, the mixed-effects model equations follow naturally.

Unfortunately, users of mixed effect models often have false preconceptions about what random effects are and how they differ from fixed effects. People hear "random" and think it means something very special about the system being modeled, like fixed effects have to be used when something is "fixed" while random effects have to be used when something is "randomly sampled". But there's nothing particularly random about assuming that model coefficients come from a distribution; it's just a soft constraint, similar to the $\ell_2$ penalty applied to model coefficients in ridge regression. There are many situations when you might or might not want to use random effects, and they don't necessarily have much to do with the distinction between "fixed" and "random" quantities.

Unfortunately, the concept confusion caused by these terms has led to a profusion of conflicting definitions. Of the five definitions at this link, only #4 is completely correct in the general case, but it's also completely uninformative. You have to read entire papers and books (or failing that, this post) to understand what that definition implies in practical work.

Let's look at a case where random effects modeling might be useful. Suppose you want to estimate average US household income by ZIP code. You have a large dataset containing observations of households' incomes and ZIP codes. Some ZIP codes are well represented in the dataset, but others have only a couple households.

For your initial model you would most likely take the mean income in each ZIP. This will work well when you have lots of data for a ZIP, but the estimates for your poorly sampled ZIPs will suffer from high variance. You can mitigate this by using a shrinkage estimator (aka partial pooling), which will push extreme values towards the mean income across all ZIP codes.

But how much shrinkage/pooling should you do for a particular ZIP? Intuitively, it should depend on the following:

- How many observations you have in that ZIP
- How many observations you have overall
- The
*individual-level*mean and variance of household income across all ZIP codes - The
*group-level*variance in mean household income across all ZIP codes

If you model ZIP code as a random effect, the mean income estimate in all ZIP codes will be subjected to a statistically well-founded shrinkage, taking into account all the factors above.

The best part is that random and mixed effects models automatically handle (4), the variability estimation, for all random effects in the model. This is harder than it seems at first glance: you could try the variance of the sample mean for each ZIP, but this will be biased high, because some of the variance between estimates for different ZIPs is just sampling variance. In a random effects model, the inference process accounts for sampling variance and shrinks the variance estimate accordingly.

Having accounted for (1)-(4), a random/mixed effects model is able to determine the appropriate shrinkage for low-sample groups. It can also handle much more complicated models with many different predictors.

If this sounds like hierarchical Bayesian modeling to you, you're right - it is a close relative but not identical. Mixed effects models are hierarchical in that they posit distributions for latent, unobserved parameters, but they are typically not fully Bayesian because the top-level hyperparameters will not be given proper priors. For example, in the above example we would most likely treat the mean income in a given ZIP as a sample from a normal distribution, with unknown mean and sigma to be estimated by the mixed-effects fitting process. However, a (non-Bayesian) mixed effects model will typically not have a prior on the unknown mean and sigma, so it's not fully Bayesian. That said, with a decent-sized data set, the standard mixed effects model and the fully Bayesian variant will often give very similar results.

*While many treatments of this topic focus on a narrow definition of "group", the concept is in fact very flexible: it is just a set of observations that share a common property. A group could be composed of multiple observations of a single person, or multiple people in a school, or multiple schools in a district, or multiple varieties of a single kind of fruit, or multiple kinds of vegetable from the same harvest, or multiple harvests of the same kind of vegetable, etc. Any categorical variable can be used as a grouping variable.

12+6. I think this is currently the best answer in this thread and hopefully with time it will become the most upvoted one. One suggestion that I would make is to include some formulas: perhaps in your Example section you can provide formulas specifying the fixed- and the random-effects models (and perhaps also the "single-coefficient" model, i.e. the one with "complete pooling"). I think formulas will make your answer both clearer and more attractive/appealing (currently it looks a little bit like a wall of text). – amoeba – 2016-05-05T20:05:40.300

3@amoeba thanks! You're right about coefficient being the wrong word, it's more like "model term" than coefficient. Formulas would help clear this and other questions up. I've been slowly tweaking this answer as time and inspiration hit, and will continue to do so until it gets where it needs to go! I will probably flesh out the formulas for "regression against a single categorical variable." Complete pooling = group coefficients are identical (delta prior, zero sigma), partial pooling = they can differ a bit (finite sigma), no pooling = no constraint (infinite sigma). – Paul – 2016-05-05T23:58:59.440

Thanks for the great answer! However, I lost you at "You can mitigate this by using a shrinkage estimator (aka partial pooling), which will push extreme values towards the mean income across all ZIP codes." What is partial pooling? Could you give an intuitive example? Also, how does the Wikipedia page on random effects agree with what you said? Their example of a "random effect" does not consider sample sizes whatsoever.

– AlphaOmega – 2016-09-26T14:43:17.2602Congratulations on passing 100 upvotes for this answer :-) – amoeba – 2017-04-26T15:15:37.870

I followed almost everything (I'm used to working with Hierarchical Bayesian models and the concept of pooling), but what *exactly* did you mean by **"The individual-level mean and variance of household income across all ZIP codes"** (i.e. what do you mean by individual *across* zip codes?). What's the formula here? – Josh – 2017-10-19T22:14:39.423

In the stochastic model that mixed effects methods are based on, each household's income is the sum of (a) the average income across the ZIP it's contained in, and (b) an extra bit particular to that individual household. (a) and (b) are random samples from distributions A and B, each with their own parameters. Here I'm referring to mean(B) and variance(B). Typically mean(B) would be zero for identifiability; variance(B) is the quantity of interest, especially as it contrasts with variance(A). – Paul – 2017-10-19T22:30:53.400

1

@Paul I'm really struggling with understanding how to merge this answer (e.g. "People...think...fixed effects have to be used when something is "fixed" while random effects have to be used when something is "randomly sampled") with what I see in the way that the standard errors turn out in mixed models, where the SEs with random effects seem to me only consistent with the assumption that they are randomly sampled, and the SEs with fixed effects only if they are fixed. See e.g. here . What am I missing? Any thoughts appreciated beyond words!!

– justme – 2017-12-18T11:55:01.760This is a good question and I found your simulations interesting. Short answer: the standard error computed in lme4 is based on assumptions about what users want. Many users of mixed models want to think of their groups as randomly sampled. You don't have to, but you may need a different standard error in that case. – Paul – 2017-12-18T14:56:58.200

@Paul -- thanks so much! Finally mixed models are making some sense. I was wondering if you had any references for calculating standard errors in such cases? Either books/papers detailing the maths, or functions in R that would offer the functionality? I have to teach this stuff soon (ooops) so it's important to me that I understand all the tools available. Thanks again! – justme – 2017-12-18T21:22:27.870

(...though, obviously, as shown in the link, one option would be bootstrapping without simulating new `u`

values) – justme – 2017-12-18T21:50:20.133

@justme That's a great question. I'm actually not much of an expert on mixed models. I wrote this post partly to help myself understand and push back against some misconceptions. I'm confident that what I told you above is correct, because my intuition about such things tends to be quite reliable, but I can't point you to any sources that take my viewpoint on this. – Paul – 2017-12-18T22:37:13.993

37

I have written about this in a book chapter on mixed models (chapter 13 in Fox, Negrete-Yankelevich, and Sosa 2014); the relevant pages (pp. 311-315) are available on Google Books. I think the question reduces to "what are the definitions of fixed and random effects?" (a "mixed model" is just a model that contains both). My discussion says a bit less about their formal definition (for which I would defer to the Gelman paper linked by @JohnSalvatier's answer above) and more about their practical properties and utility. Here are some excerpts:

The traditional view of random effects is as a way to do correct statistical tests when some observations are correlated.

We can also think of random effects as a way to combine information from different levels within a grouping variable.

Random effects are especially useful when we have (1) lots of levels (e.g., many species or blocks), (2) relatively little data on each level (although we need multiple samples from most of the levels), and (3) uneven sampling across levels (box 13.1).

Frequentists and Bayesians define random effects somewhat differently, which affects the way they use them. Frequentists define random effects as categorical variables whose levels are chosen

at random from a larger population, e.g., species chosen at random from a list of endemic species. Bayesians define random effects as sets of variables whose parameters are [all] drawn from [the same] distribution. The frequentist definition is philosophically coherent, and you will encounter researchers (including reviewers and supervisors) who insist on it, but it can be practically problematic. For example, it implies that you can’t use species as random effect when you have observed all of the species at your field site—since the list of species is not a sample from a larger population—or use year as a random effect, since researchers rarely run an experiment in randomly sampled years—they usually use either a series of consecutive years, or the haphazard set of years when they could get into the field.Random effects can also be described as predictor variables where you are interested in making inferences about the distribution of values (i.e., the variance among the values of the response at different levels) rather than in testing the differences of values between particular levels.

People sometimes say that random effects are “factors that you aren’t interested in.” This is not always true. While it is often the case in ecological experiments (where variation among sites is usually just a nuisance), it is sometimes of great interest, for example in evolutionary studies where the variation among genotypes is the raw material for natural selection, or in demographic studies where among-year variation lowers long-term growth rates. In some cases fixed effects are also used to control for uninteresting variation, e.g., using mass as a covariate to control for effects of body size.

You will also hear that “you can’t say anything about the (predicted) value of a conditional mode.” This is not true either—you can’t formally test a null hypothesis that the value is equal to zero, or that the values of two different levels are equal, but it is still perfectly sensible to look at the predicted value, and even to compute a standard error of the predicted value (e.g., see the error bars around the conditional modes in figure 13.1).

The Bayesian framework has a simpler definition of random effects. Under a Bayesian approach, a fixed effect is one where we estimate each parameter (e.g., the mean for each species within a genus) independently (with independently specified priors), while for a random effect the parameters for each level are modeled as being drawn from a distribution (usually Normal); in standard statistical notation, $\textrm{species_mean} \sim {\cal N}(\textrm{genus_mean}, \sigma^2_{\textrm{species}})$.

I said above that random effects are most useful when the grouping variable has many measured levels. Conversely, random effects are generally ineffective when the grouping variable has too few levels. You usually can’t use random effects when the grouping variable has fewer than five levels, and random effects variance estimates are unstable with fewer than eight levels, because you are trying to estimate a variance from a very small sample.

the preview presently shows no pages after 311, and misses p 310, which seems like it'd be very useful here... – flies – 2015-10-14T16:41:02.057

maybe it's a regional issue? thanks for the clear answer above, anyhow! – flies – 2015-10-23T20:41:15.727

1I also don't have access to the Google Books result. Thanks for including the text here. – MichaelChirico – 2017-03-21T20:48:27.797

I really like this excerpt. This is maybe the clearest and most useful description on when and why to use random effects that I've seen. Wish I had it when I was teaching a couple years back. – Gregor – 2017-12-29T21:27:38.033

36

Fixed effect: Something the experimenter directly manipulates and is often repeatable, e.g., drug administration - one group gets drug, one group gets placebo.

Random effect: Source of random variation / experimental units e.g., individuals drawn (at random) from a population for a clinical trial. Random effects estimates the variability

Mixed effect: Includes both, the fixed effect in these cases are estimating the population level coefficients, while the random effects can account for individual differences in response to an effect, e.g., each person receives both the drug and placebo on different occasions, the fixed effect estimates the effect of drug, the random effects terms would allow for each person to respond to the drug differently.

General categories of mixed effects - repeated measures, longitudinal, hierarchical, split-plot.

@AndyW: Do I understand correctly that your understanding of what "fixed effect" is corresponds to the definition #1 as listed by Gelman and quoted in the JohnSalvatier's (accepted) answer in this thread? – amoeba – 2016-05-05T15:04:21.933

@amoeba, When someone says fixed effects in econometrics jargon they are typically referring to a model, whereas #1 refers to a single parameter. So in the Stata link I provided, in the example model lets swap out $b*t$ with something that varies between the $i$ observations, $x*{it}$. So then we have $y_{it} = a

The *fixed* description comes from the fact that $a_i$ does not vary with $t$ - hence it is fixed in time (or whatever $t$ refers to). We do not need to observe (nor estimate) the individual $a_i$ to estimate $\beta$ in this model. Also the $a_i$ are not assumed to be random is this model, so I hesitate to say it is the same description. – Andy W – 2016-05-05T15:48:01.807

1The $a_i$ are actually what most economists would think of when you say fixed effect I would guess - although you don't estimate them in the model. They are just nuisance terms you subtract out to get unbiased estimates for other parameters. (Just writing out the damn model is so much simpler than wading through inexact jargon.) – Andy W – 2016-05-05T15:52:39.747

1Thank you, @Andy. As far as I understand, your description fits precisely to the biostatistics/mixed-models jargon, so I don't see any econometrics/biostatistics clash in this case. The $a_i$ terms in the model you wrote down would also be considered fixed effects in the mixed models lingo. I downvoted this answer, by the way, because the "definitions" given here are not helpful at all (and are actually not definitions but perhaps some rules of thumb for deciding when to use random and when to use fixed effects in a particular application field). – amoeba – 2016-05-05T15:57:31.300

1@amoeba I agree this answer should be -1. It does not provide an accurate general explanation, nor does it specify the conditions in which this particular explanation would be valid. So who could possibly come across this answer and gain reliable, useful knowledge? – Paul – 2016-05-05T16:58:50.643

3

Your not wrong, but your definition of what a fixed effect is is not what I would think of when someone says fixed effect. Here is what I think of when someone says fixed effect http://en.wikipedia.org/wiki/Difference_in_differences , or this http://www.stata.com/support/faqs/stat/xtreg2.html (particularly equation 3 on the Stata page)

– Andy W – 2010-11-19T13:44:08.36316

I came to this question from here, a possible duplicate.

There are several excellent answers already, but as stated in the accepted answer, there are many different (but related) uses of the term, so it might be valuable to give the perspective as employed in econometrics, which does not yet seem fully addressed here.

Consider a linear panel data model: $$ y_{it}=X_{it}\delta+\alpha_i+\eta_{it}, $$ the so-called error component model. Here, $\alpha_i$ is what is sometimes called individual-specific heterogeneity, the error component that is constant over time. The other error component $\eta_{it}$ is "idiosyncratic", varying both over units and over time.

A reason to use a random effects approach is that the presence of $\alpha_i$ will lead to an error covariance matrix that is not "spherical" (so not a multiple of the identity matrix), so that a GLS-type approach like random effects will be more efficient than OLS).

If, however, the $\alpha_i$ correlate with the regressors $X_{it}$ - as will be the case in many typical applications - one of the underlying assumptions for consistency of the standard textbook (at least what is standard in econometric textbooks) random effects estimator, viz. $Cov(\alpha_i,X_{it})=0$, is violated. Then, a fixed effect approach which effectively fits such intercepts will be more convincing.

The following figure aims to illustrate this point. The raw correlation between $y$ and $X$ is positive. But, the observations belonging to one unit (color) exhibit a negative relationship - this is what we would like to identify, because this is the reaction of $y_{it}$ to a change in $X_{it}$.

Also, there is correlation between the $\alpha_i$ and $X_{it}$: If the former are individual-specific intercepts (i.e., expected values for unit $i$ when $X_{it}=0$), we see that the intercept for, e.g., the lightblue panel unit is much smaller than that for the brown unit. At the same time, the lightblue panel unit has much smaller regressor values $X_{it}$.

So, pooled OLS would be the wrong strategy here, because it would result in a positive esimate of $\delta$, as this estimator basically ignores the colors. RE would also be biased, being a weighted version of FE and the between estimator, which regresses the "time"-averages over $t$ onto each other. The latter however also requires lack of correlation of $\alpha_i$ and $X_{it}$.

This bias however vanishes as $T$, the number of time periods per unit (`m`

in the code below), increases, as the weight on FE then tends to one (see e.g. Hsiao, Analysis of Panel Data, Sec. 3.3.2).

Here is the code that generates the data and which produces a positive RE estimate and a "correct", negative FE estimate. (That said, the RE estimates will also often be negative for other seeds, see above.)

```
library(Jmisc)
library(plm)
library(RColorBrewer)
# FE illustration
set.seed(324)
m = 8
n = 12
step = 5
alpha = runif(n,seq(0,step*n,by=step),seq(step,step*n+step,by=step))
beta = -1
y = X = matrix(NA,nrow=m,ncol=n)
for (i in 1:n) {
X[,i] = runif(m,i,i+1)
X[,i] = rnorm(m,i)
y[,i] = alpha[i] + X[,i]*beta + rnorm(m,sd=.75)
}
stackX = as.vector(X)
stackY = as.vector(y)
darkcols <- brewer.pal(12, "Paired")
plot(stackX,stackY,col=rep(darkcols,each=m),pch=19)
unit = rep(1:n,each=m)
# first two columns are for plm to understand the panel structure
paneldata = data.frame(unit,rep(1:m,n),stackY,stackX)
fe <- plm(stackY~stackX, data = paneldata, model = "within")
re <- plm(stackY~stackX, data = paneldata, model = "random")
```

The output:

```
> fe
Model Formula: stackY ~ stackX
Coefficients:
stackX
-1.0451
> re
Model Formula: stackY ~ stackX
Coefficients:
(Intercept) stackX
18.34586 0.77031
```

1what is $\delta$ – adam – 2016-02-19T11:18:44.367

The regression coefficient, see the first display. – Christoph Hanck – 2016-02-19T11:38:55.070

quite good response. wish I could do more ups. – subhash c. davar – 2016-03-22T14:52:48.790

@Paul - see the update, which shows you the DGP. Indeed, my post is about random and fixed effects and not about mixed models (like others in this thread). I do not claim one should be using RE in this case, I just show what goes wrong if you do. Feel free to maintain the downvote if that does not address your concerrns. – Christoph Hanck – 2016-05-03T09:19:09.550

With more thought, I can see why a random effects intercept could run into the rocks on this example. It's not really a good basis for a general fixed vs random distinction, but it's a helpful example anyways. If you edit your post again (just in some trivial manner to satisfy SE) I should be able to set an upvote... – Paul – 2016-05-03T10:37:25.207

You may find it interesting to look at what happens as m is increased, say to 100 or 1000 points. If random effects were "omitting" the individual-specific intercepts as your post implies, it should always estimate a positive $\delta$, no matter how big m is, correct? Now go and see what actually happens, and edit this post accordingly. – Paul – 2016-05-03T10:52:03.920

1

Also, it turns out it is possible to handle this example with mixed effects. Here's the paper that shows how: http://academiccommons.columbia.edu/download/fedora_content/download/ac:125244/CONTENT/Bafumi_Gelman_Midwest06.pdf

– Paul – 2016-05-03T11:58:55.1331No doubt about the last point, see my previous comment. As for the the 2nd to last comment, yes, RE gives a weight of one on FE and zero on the between estimator as $T\to\infty$, but the asymptotics are typically taken with respect to $N$. See my edited answer for references. You have a point that my story about lack of controlling for intercepts indeed seems more useful for pooled OLS, and I revised along these lines. – Christoph Hanck – 2016-05-03T14:39:43.833

+1. I have just been pointed at a paper by Clark and Linzer 2012 which provides a very helpful range of simulations to explore under what conditions a fixed effects model outperforms a random effects model and vice versa. Your example fits nicely to what they discuss and presents a case of very high correlation between group intercept and $x$, together with large between-group variability compared to the within-group variability. This is exactly the situation when random effects are likely to go wrong.

– amoeba – 2016-05-04T21:33:32.5436In the foregoing discussion it would be more accurate to replace "random effects" with "the restricted version of random effects implemented in R's plm package". There are other random effects models which would handle the correlated predictor / group issue just fine, as in the paper cited in my previous comment. They are just not yet part of the econometrics packages/literature. It seems that econometrics definitions of fixed and random effects are very domain-specific and not really representative of their more fundamental general meanings from the statistical literature. – Paul – 2016-05-05T00:58:34.593

3Fair point, I made a little edit. But imo, this is precisely what makes this thread so valuable: different fields mean different things by more or less the same terminology, and the various posts help spell out these differences. – Christoph Hanck – 2016-05-05T08:47:11.200

Absolutely. This is a good answer now and broadens the perspective on SE.stats. – Paul – 2016-05-05T15:30:27.483

Hi Chrostiph and @Paul. I am trying to understand this example better; for that I tried to fit these data with `lm`

/`lmer`

. When I run `lm(stackY~stackX+as.factor(unit), paneldata)`

I get exactly the same slope as your "within" estimator. This makes sense, within estimator corresponds to what biostatistics would call fixed effect (fixed intercept) of unit. But when I run `lmer(stackY~stackX+(1|as.factor(unit)), paneldata)`

(random effect of unit) I get slope = -1.03 which is very far from your "random" estimator obtained with `plm`

. Why? I am very confused by this. – amoeba – 2016-10-02T22:57:18.150

Uh, I have never used the `lmer`

package, so I would hope that somebody else would be able to weigh in. – Christoph Hanck – 2016-10-03T07:51:51.170

1

I see. I decided it might make sense to post it as a separate question: http://stats.stackexchange.com/questions/238214. CC to @Paul.

– amoeba – 2016-10-03T16:10:33.970@amoeba that's a great question. I don't know for sure, but my guess is that lmer is using a smarter estimation process than plm, more like the Bafumi-Gelman paper I referenced above. As I explained in my answer, "random effects" is more like an ingredient one can add to an estimator, like a ridge penalty, rather than a single well-defined estimator per se. Random effects do a great job in this problem if you do them carefully, and apparently lmer is doing so.

– Paul – 2016-10-03T17:18:05.847@Paul Actually it does not look like that. I am pretty sure that lmer is not using the Bafumi-Gelman suggestion (no additional parameters are estimated); it appears that mixed model does just fine here even without that trick. I don't think that `plm`

is doing "mixed model" at all; it seems to be using some GLS procedure instead. There is no definitive answer to my question so far, but several helpful ones that you might want to read. I am still not sure, but currently I start to think that "random effects" in econometrics have simply no relation to "random effects" in mixed model literature. – amoeba – 2016-10-03T21:35:40.413

11

The distinction is only meaningful in the context of non-Bayesian statistics. In Bayesian statistics, all model parameters are "random".

Interesting. But since fixed or random can be considered a condition of a given variable (a given column of data) rather than of a parameter associated with that variable,...does your answer fully apply? – rolando2 – 2012-01-27T00:48:39.643

1@rolando2 In any case, this is simply false. Specifically, for Bayesians the parameters are whatever kind of thing the theory / likelihood says they are. Only one's *uncertainty about what values they take* is represented using probability distributions. Consequently sometimes the parameters are modeled as fixed and unknown ('fixed') and sometimes as coming from a distribution ('random') though the latter device is often motivated by an exchangeability judgement rather than a belief about a sampling process. – conjugateprior – 2016-05-02T17:18:15.977

This is in contrast to @ben answer. I believe the answer is wrong. – SmallChess – 2017-04-02T08:41:28.923

8

Not really a formal definition, but I like the following slides: Mixed models and why sociolinguists should use them (mirror), from Daniel Ezra Johnson. A brief recap' is offered on slide 4. Although it mostly focused on psycholinguistic studies, it is very useful as a first step.

These slides are not useful. – flies – 2015-10-14T16:42:20.947

4While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Ben Bolker – 2016-09-05T18:14:39.323

I think I'm going to need to see that presentation in person to get the full impact. – Andy W – 2010-11-19T13:36:07.997

7

In econometrics, the terms are typically applied in generalized linear models, where the model is of the form

$$y_{it} = g(x_{it} \beta + \alpha_i + u_{it}). $$

**Random effects:** When $\alpha_i \perp u_{it}$,

**Fixed effects:** When $\alpha_i \not \perp u_{it}$.

In **linear models**, the presence of a random effect does not result in inconsistency of the OLS estimator. However, using a random effects estimator (like feasible generalized least squares) will result in a more *efficient* estimator.

In **non-linear models**, such as probit, tobit, ..., the presence of a random effect will, in general, result in an inconsistent estimator. Using a random effects estimator will then restore consistency.

For both linear and non-linear models, fixed effects results in a bias. However, in linear models there are transformations that can be used (such as first differences or demeaning), where OLS on the transformed data will result in consistent estimates. For non-linear models, there are a few exceptions where transformations exist, *fixed effects logit* being one example.

**Example: Random effects probit.** Suppose

$$ y^*_{it} = x_{it} \beta + \alpha_i + u_{it}, \quad \alpha_i \sim \mathcal{N}(0,\sigma_\alpha^2), u_{it} \sim \mathcal{N}(0,1). $$

and the observed outcome is

$$ y_{it} = \mathbb{1}(y^*_{it} > 0). $$

The *Pooled maximum likelihood estimator* minimizes the sample average of

$$ \hat{\beta} = \arg \min_\beta N^{-1} \sum_{i=1}^N \log \prod_{t=1}^T [G(x_{it}\beta)]^{y_{it}} [1 - G(x_{it}\beta)] ^{1-y_{it}}. $$

Of course, here the log and the product simplify, but for pedagogical reasons, this makes the equation more comparable to the random effects estimator, which has the form

$$ \hat{\beta} = \arg \min_\beta N^{-1} \sum_{i=1}^N \log \int \prod_{t=1}^T [G(x_{it}\beta + \sigma_\alpha a)]^{y_{it}} [1 - G(x_{it}\beta + \sigma_\alpha a )] ^{1-y_{it}} \phi(a) \mathrm{d}a. $$

We can for example approximate the integral by randomization by taking $R$ draws of random normals and evaluating the likelihood for each.

$$ \hat{\beta} = \arg \min_\beta N^{-1} \sum_{i=1}^N \log R^{-1} \sum_{r=1}^R \prod_{t=1}^T [G(x_{it}\beta + \sigma_\alpha a_r)]^{y_{it}} [1 - G(x_{it}\beta + \sigma_\alpha a )] ^{1-y_{it}},\quad a_r \sim \mathcal{N}(0,1). $$

The intuition is the following: we don't know what type, $\alpha_i$, each observation is. Instead, we evaluate the product of likelihoods over time for a sequence of draws. The most likely type for observation $i$ will have the highest likelihood in all periods and will therefore dominate the likelihood contribution for that $T$-sequence of observations.

2

Another very practical perspective on random and fixed effects models comes from econometrics when doing linear regressions on panel data. If you’re estimating the association between an explanatory variable and an outcome variable in a dataset with multiple samples per individual / group, this is the framework you want to use.

A good example of panel data is yearly measurements from a set of individuals of:

- $gender_i$ (gender of the $i$th person)
- ${\Delta}weight_{it}$ (weight change during year $t$ for person $i$)
- $exercise_{it}$ (average daily exercise during year $t$ for person $i$)

If we’re trying to understand the relationship between exercise and weight change, we’ll set up the following regression:

${\Delta}weight_{it} = \beta_0$$exercise_{it} + \beta_1gender_i + \alpha_i + \epsilon_{it}$

- $\beta_0$ is the quantity of interest
- $\beta_1$ is not interesting, we're just controlling for gender with it
- $\alpha_i$ is the per-individual intercept
- $\epsilon_{it}$ is the error term

In a setup like this there is the risk of endogeneity. This can happen when unmeasured variables (such as marital status) are associated with both exercise and weight change. As explained on p.16 in this Princeton lecture, a random effects (AKA mixed effects) model is more efficient than a fixed effects model. However, it will incorrectly attribute some of the effect of the unmeasured variable on weight change to exercise, producing an incorrect $\beta_0$ and potentially a higher statistical significance than is valid. In this case the random effects model is not a consistent estimator of $\beta_0$.

A fixed effects model (in its most basic form) controls for any unmeasured variables that are constant over time but vary between individuals by explicitly including a separate intercept term for each individual ($\alpha_i$) in the regression equation. In our example, it will automatically control for confounding effects from gender, as well as any unmeasured confounders (marital status, socioeconomic status, educational attainment, etc…). In fact, gender cannot be included in the regression and $\beta_1$ cannot be estimated by a fixed effects model, since $gender_i$ is collinear with the $\alpha_i$'s.

So, the key question is to determine which model is appropriate. The answer is the Hausman Test. To use it we perform both the fixed and random effects regression, and then apply the Hausman Test to see if their coefficient estimates diverge significantly. If they diverge, endogeneity is at play and a fixed effects model is the best choice. Otherwise, we’ll go with random effects.

2I also find that sometimes is difficult to determine when an effect must be considered as fixed or as random effect. Althought there are some recommendations about this fact, not always is easy to take the right decision. – Manuel Ramón – 2010-11-19T10:29:35.280

2

I think that this link may be helpful in clarifying the underlying principles of mixed models: Fixed, Random, and Mixed Models (SAS documentation).

– pietrop – 2013-09-04T10:27:58.2574

An extremely helpful answer can also be found here: What is a difference between random effects-, mixed effects- & marginal model?

– gung – 2014-11-19T20:21:40.497