Wednesday, August 13, 2014

Plotting mixed-effects model results with effects package

As separate by-subjects and by-items analyses have been replaced by mixed-effects models with crossed random effects of subjects and items, I've often found myself wondering about the best way to plot data. The simple-minded means and SE from trial-level data will be inaccurate because they won't take the nesting into account. If I compute subject means and plot those with by-subject SE, then I'm plotting something different from what I analyzed, which is not always terrible, but definitely not ideal. It seems intuitive that the condition means and SE's are computable from the model's parameter estimates, but that computation is not trivial, particularly when you're dealing with interactions. Or, rather, that computation was not trivial until I discovered the effects package.

To show how this would work, I pillaged some data from a word-to-picture matching pilot study. Younger adults (college students) and older adults (mostly 50s and 60s) did a word-to-picture matching task in the presence of either cohort competitors (camera - camel) or semantic competitors (lion - tiger).

> summary(RT.demo)
    Subject         Target        Condition        ACC          RT       Group   
 2102   :  40   chicken:  36   Cohort  :690   Min.   :1   Min.   :1503   YC:701  
 2103   :  40   hat    :  36   Semantic:675   1st Qu.:1   1st Qu.:2131   OC:664  
 2104   :  40   penny  :  36                  Median :1   Median :2362           
 2106   :  40   potato :  36                  Mean   :1   Mean   :2442           
 2109   :  40   radio  :  36                  3rd Qu.:1   3rd Qu.:2684           
 2116   :  40   stool  :  36                  Max.   :1   Max.   :4847           
 (Other):1125   (Other):1149                                                     

> ggplot(RT.demo, aes(Condition, RT, fill=Group, color=Group)) + geom_violin() + theme_bw(base_size=12)
Not surprisingly, the response times for older adults are slower than for younger adults, but it looks like this might be particularly true in the presence of semantic competitors. Let's test that with a mixed model with crossed random effects of subjects and items.

> m <- lmer(RT ~ Condition*Group + (Condition | Subject) + (1 | Target), data=RT.demo)
> coef(summary(m))
                          Estimate Std. Error t value
(Intercept)               2230.057     64.749   34.44
ConditionSemantic           -7.881     68.565   -0.11
GroupOC                    413.287     65.097    6.35
ConditionSemantic:GroupOC  104.096     33.110    3.14

So it looks like the older adults are about 400ms slower than the younger adults in the cohort condition and another 100ms slower in the semantic condition. Now we can use the effects package to convert these parameter estimates into condition mean and SE estimates. The key function is effect(), which takes a term from the model and the model object. We can use summary() on the effect list object to get the information we need.

> library(effects)
> ef <- effect("Condition:Group", m)
> summary(ef)

 Condition*Group effect
Condition        YC       OC
  Cohort   2230.057 2643.344
  Semantic 2222.176 2739.559

 Lower 95 Percent Confidence Limits
Condition        YC       OC
  Cohort   2103.037 2516.026
  Semantic 2088.161 2605.384

 Upper 95 Percent Confidence Limits
Condition        YC       OC
  Cohort   2357.076 2770.662
  Semantic 2356.190 2873.734

For the purposes of plotting, we want to convert the effect list object into a data frame. Conveniently, there is a function:

> x <-
> x
  Condition Group      fit       se    lower    upper
1    Cohort    YC 2230.057 64.74945 2103.037 2357.076
2  Semantic    YC 2222.176 68.31514 2088.161 2356.190
3    Cohort    OC 2643.344 64.90167 2516.026 2770.662
4  Semantic    OC 2739.559 68.39711 2605.384 2873.734

Now we can plot this:
> ggplot(x, aes(Condition, fit, color=Group)) + geom_point() + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4) + theme_bw(base_size=12)
Or for people who like dynamite plots:
> ggplot(x, aes(Group, fit, color=Condition, fill=Condition)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4, position=position_dodge(width=0.9)) + theme_bw(base_size=12)


  1. I've never used the effects package for this; thanks Dan. I was interested in how effect() would compare with predictSE() from the AICcmodavg package, which I've used for the same reasons you cite. I was happy to see that they matched.

    > mod <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
    >"Days", mod, xlevels = list(Days = seq(1,8,by=1))))
    >, newdata = list(Days = seq(1,8,by=1))))

    I like that effect() gives 95% CIs.

  2. Hi, Dan

    I would like to ask you a question related to this post. It's about what to do when the pattern of by subjecs RTs are different from the lsmeans of the LME model and this last pattern is what was expected.

    I'm analysing inverse transformed RT data with LME.

    My model is : model = invRT ~ A*B + C + (A | Subject) + (B | Item)

    where A and B are categorical variables (treatment coded because I'm interested in the differences respect to a baseline) and C is a continuous covariate.

    Let's say that the raw by subjects RT pattern of the condition A is as follows:

    Level 1. 650 ms
    Level 2. 625 ms
    Level 3. 623 ms
    Level 4. 624 ms

    But the back-transformed lsmeans of the model are:

    Level 1. 650 ms
    Level 2. 622 ms
    Level 3. 626 ms
    Level 4. 627 ms

    and the difference between levels 1 and 2 are significant (t > 2) whereas the difference 1 vs. 3 and 1 vs. 4 are not.

    This is actually the result I was expecting. Would it be OK to report the back-transformed lsmeans as the RT pattern of the data?

    Thanks in advance.

    1. Yes, I think it is OK to report the condition RTs estimated from the model by lsmeans (or effect). In fact, I think it might be better than reporting the raw by-subjects RTs because the lsmeans result better reflects your analysis model, which is (presumably) the basis of your inferences/conclusions. Having been trained in the pre-mixed-effects-models days, I initially found plotting model-estimated RTs somewhat strange, but I found it useful to remember that by-subject condition means is also a model of the data, and it's *not* the model actually used to analyze the data.

      As a side note, be careful about drawing conclusions from the fact that the difference between levels 1 and 2 is significant, but the other differences are not. The difference between significant and not significant is not itself statistically significant (Gelman & Stern, 2006: Just looking at the condition means, it looks to me like levels 2-4 are pretty much the same and level 1 is slower. You can recode the contrasts to test level 1 against levels 2-4, though I don't know whether that would be informative for your hypothesis. In any case, I wouldn't buy a conclusion like "level 2 < level 1 = level 3 = level 4".

    2. summary(glht(model, lsm(pairwise ~ A)))
      summary(glht(model, lsm(pairwise ~ A)), test=adjusted("bonferroni))

      Fit: lme4::lmer(formula = RTinv ~ A * B + C +
      (A | subject) + (A | item), data = words,
      REML = FALSE, control = lmerControl(optimizer = "bobyqa"),
      subset = abs(scale(resid(words_1))) < 2.5)

      Linear Hypotheses:
      Estimate Std. Error z value Pr(>|z|)
      Level1 - Level2 == 0 0.035550 0.014152 2.512 0.0578 .
      (Level1 - Level2 == 0 0.035550 0.014152 2.512 0.072 (test=adjusted("bonferroni"))
      Level1 - Level3 == 0 0.028610 0.013881 2.061 0.1656
      Level1 - Level4 == 0 0.020807 0.015122 1.376 0.5139
      Level2 - Level3 == 0 -0.006940 0.013601 -0.510 0.9566
      Level2 - Level4 == 0 -0.014743 0.014379 -1.025 0.7341
      Level3 - Level4 == 0 -0.007803 0.013767 -0.567 0.9418
      Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
      (Adjusted p values reported -- single-step method)

  3. How did you get the back transformed lsmeans ? Is there a script for that? Did you use partial derivatives?

    1. What I did is just:

      -1000/effect("condition", model.1)$fit

      But see Lo & Andrews (2015) and the supplementary material
      and you'll find this:

      The following R syntax was used to code the specific form of the inverse link function (-1000/μ ̂), which was called when generating models assuming an inverse relationship between the predictors and RT. (Lo & Andrews, 2015)

      invfn <- function() {
      ## link
      linkfun <- function(y) -1000/y
      ## inverse link
      linkinv <- function(eta) -1000/eta
      ## derivative of invlink wrt eta
      mu.eta <- function(eta) { 1000/(eta^2) }
      valideta <- function(eta) TRUE
      link <- "-1000/y"
      structure(list(linkfun = linkfun, linkinv = linkinv,
      mu.eta = mu.eta, valideta = valideta,
      name = link),
      class = "link-glm")

      Just to check you can run a model like:

      model.1<-glmer(RT ~ condition +(1|subject) +(1|item), data=DATA, family=inverse.gaussian(link=invfn()))

      And check that the output is in RTinv scale.

      effect("condition", model.1)

      and you'll see that the output is automatically back-transformed in RTs.

      When you do

      effect("condition", model.1)$fit

      you get the coefficients in RTinv scale.


      -1000/effect("condition", model.1)$fit

      and you'll see that the results are the same.

      (This is what I did, but may be I'm doing something wrong...)

  4. Hi, your post is really useful. I used the effect function on some data from a research project of mine and found that the standard error values are always the same for each factor level (see output below). My data set is from an experiment where replication is balanced, and if I delete a few observations to make it unbalanced then it changes the standard errors and they are not all the same. How are the standard errors calculated and why are they equivalent when the data set is balanced? It seems that the exact same information is going into calculating the values for each factor level, but why would that be? If I calculate the SEs for the different groups using the raw data, obviously the SEs are very different from group to group.

    Depth fit se lower upper
    1 10-20cm 6528.954 1557.851 3359.482 9698.425
    2 30-40cm 5466.135 1557.851 2296.664 8635.607

    1. There are some good discussions online about how to compute confidence intervals (or SE) for mixed-effects models. I recommend a recent one on ling-r-lang-lang ( and a slightly older one on CrossValidated (