Friday, August 10, 2012

Treating participants (or items) as random vs. fixed effects

Connoisseurs of multilevel regression will already be familiar with this issue, but it is the single most common topic for questions I receive about growth curve analysis (GCA), so it seems worth discussing. The core of the issue is that in our paper about using GCA for eye tracking data (Mirman, Dixon, & Magnuson, 2008) we treated participants as fixed effects. In contrast, multilevel regression in general, and specifically the approach described by Dale Barr (2008), which is nearly identical to ours, treated participants as random effects. 

First, we should be clear about the conceptual distinction between "fixed" and "random" effects. This turns out not to be a simple matter because many, sometimes conflicting, definitions have been proposed (cf. Gelman, 2005). In the context of the kind of questions and data scenarios we typically face in cognitive neuroscience, I would say:
  • Fixed effects are the effects that we imagine to be constant in the population or group under study. As such, when we conduct a study, we would like to conclude that the observed fixed effects generalize to the whole population. So if I've run a word recognition study and found that uncommon (low frequency) words are processed slower than common (high frequency) words, I would like to conclude that this difference is true of all typical adults (or at least WEIRD adults: Henrich, Heine, & Norenzayan, 2010).
  • Random effects are the differences among the individual observational units in the sample, which we imagine are randomly sampled from the population. As such, these effects should conform to a specified distribution (typically a normal distribution) and have a mean of 0. So in my word recognition experiment, some participants showed large a word frequency effect and some showed a small effect, but I am going to assume that these differences reflect random, normally-distributed variability in the population.
Statistically, the difference is that fixed effect parameters are estimated independently and not constrained by a distribution. So, in the example, estimated recognition time for low and high frequency conditions can have whatever values best describe the data. Random effects are constrained to have a mean of 0 and follow a normal distribution, so estimated recognition time for a particular participant (or item, in a by-items analysis) reflects the recognition time for that individual as well as the pattern of recognition times across all other individuals in the sample. The consequence is that random effect estimates tend to be pulled toward their mean, which is called "shrinkage". So the trade-off is between independent estimation (fixed effects) and generalization (random effects).

Returning to the original question: should participants (or items) be treated as random or fixed effects? In experimental cognitive science/neuroscience, we usually think of participants (or items) as sampled observations from some population to which we would like to generalize -- our particular participants are assumed to be randomly sampled and representative of the population of possible participants, our particular test items are assumed to be representative of possible items. These assumptions mean that participants (and items) should be treated as random effects. 

However, we don't always make such assumptions. Especially in cognitive neuropsychology, we are sometimes interested in particular participants that we think can demonstrate something important about how the mind/brain works. These participants are "interesting in themselves" and comprise a "sample that exhausts the population" (two proposed definitions of fixed effects, see Gelman, 2005). In such cases, we may want to treat participants as fixed effects. The decision depends on the ultimate inferential goals of the researcher, not on the research domain or sample population. For example, studies of certain idiosyncratic words (onomatopoeia, some morphological inflections) may reasonably treat items as fixed effects if the statistical inferences are restricted to these particular items (although, as with case studies, the theoretical implications can be broader). On the other side, in our recent study examining the effect of left temporo-parietal lesions on thematic semantic processing (Mirman & Graziano, 2012), we treated participants as random effects because our goal was to make general inferences about the effect of TPC lesions (assuming, as always, that our small group constituted a random representative sample of individuals with left TPC lesions).

Additional discussion of this issue can be found in a tech report (LCDL Technical Report 2012.03) that we have just added of our growth curve analysis page. Also, it is only fair to note that this is only one aspect of determining the "right" random effect structure and there remain important unanswered questions in this domain (here are some recent developments and discussions). Barr D.J. (2008). Analyzing ‘visual world’ eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59(4), 457-474. DOI: 10.1016/j.jml.2007.09.002
Gelman A. (2005). Analysis of variance -- why it is more important than ever. Annals of Statistics, 33(1), 1-33 arXiv: math/0504499v2
Henrich J., Heine S.J., & Norenzayan A. (2010). The weirdest people in the world? The Behavioral and Brain Sciences, 33(2-3), 61-83 PMID: 20550733
Mirman D., Dixon J.A., & Magnuson J.S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475-494 PMID: 19060958
Mirman D., & Graziano K.M. (2012). Damage to temporo-parietal cortex decreases incidental activation of thematic relations during spoken word comprehension. Neuropsychologia, 50(8), 1990-1997 PMID: 22571932

1 comment:

  1. The distinction that random effects are constrained to follow a certain distribution in estimating parameters whereas fixed effects are not makes it all a lot clearer, and helps me to strip away some of the "black box" effect where its not clear just what some statistical tests are doing. I'd be interested to read more posts like this that break down what some of the assumptions are or what the parameters really mean in the tests we commonly employ.