As separate by-subjects and by-items analyses have been replaced by mixed-effects models with crossed random effects of subjects and items, I've often found myself wondering about the best way to plot data. The simple-minded means and SE from trial-level data will be inaccurate because they won't take the nesting into account. If I compute subject means and plot those with by-subject SE, then I'm plotting something different from what I analyzed, which is not always terrible, but definitely not ideal. It seems intuitive that the condition means and SE's are computable from the model's parameter estimates, but that computation is not trivial, particularly when you're dealing with interactions. Or, rather, that computation was not trivial until I discovered the effects package.
To show how this would work, I pillaged some data from a word-to-picture matching pilot study. Younger adults (college students) and older adults (mostly 50s and 60s) did a word-to-picture matching task in the presence of either cohort competitors (camera - camel) or semantic competitors (lion - tiger).
> summary(RT.demo)
Subject Target Condition ACC RT Group
2102 : 40 chicken: 36 Cohort :690 Min. :1 Min. :1503 YC:701
2103 : 40 hat : 36 Semantic:675 1st Qu.:1 1st Qu.:2131 OC:664
2104 : 40 penny : 36 Median :1 Median :2362
2106 : 40 potato : 36 Mean :1 Mean :2442
2109 : 40 radio : 36 3rd Qu.:1 3rd Qu.:2684
2116 : 40 stool : 36 Max. :1 Max. :4847
(Other):1125 (Other):1149
> ggplot(x, aes(Group, fit, color=Condition, fill=Condition)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4, position=position_dodge(width=0.9)) + theme_bw(base_size=12)
To show how this would work, I pillaged some data from a word-to-picture matching pilot study. Younger adults (college students) and older adults (mostly 50s and 60s) did a word-to-picture matching task in the presence of either cohort competitors (camera - camel) or semantic competitors (lion - tiger).
> summary(RT.demo)
Subject Target Condition ACC RT Group
2102 : 40 chicken: 36 Cohort :690 Min. :1 Min. :1503 YC:701
2103 : 40 hat : 36 Semantic:675 1st Qu.:1 1st Qu.:2131 OC:664
2104 : 40 penny : 36 Median :1 Median :2362
2106 : 40 potato : 36 Mean :1 Mean :2442
2109 : 40 radio : 36 3rd Qu.:1 3rd Qu.:2684
2116 : 40 stool : 36 Max. :1 Max. :4847
(Other):1125 (Other):1149
> ggplot(RT.demo, aes(Condition, RT, fill=Group, color=Group)) + geom_violin() + theme_bw(base_size=12)
Not surprisingly, the response times for older adults are slower than for younger adults, but it looks like this might be particularly true in the presence of semantic competitors. Let's test that with a mixed model with crossed random effects of subjects and items.
> m <- lmer(RT ~ Condition*Group + (Condition | Subject) + (1 | Target), data=RT.demo)
> coef(summary(m))
Estimate Std. Error t value
(Intercept) 2230.057 64.749 34.44
ConditionSemantic -7.881 68.565 -0.11
GroupOC 413.287 65.097 6.35
ConditionSemantic:GroupOC 104.096 33.110 3.14
So it looks like the older adults are about 400ms slower than the younger adults in the cohort condition and another 100ms slower in the semantic condition. Now we can use the effects package to convert these parameter estimates into condition mean and SE estimates. The key function is effect(), which takes a term from the model and the model object. We can use summary() on the effect list object to get the information we need.
> library(effects)
> ef <- effect("Condition:Group", m)
> summary(ef)
Condition*Group effect
Group
Condition YC OC
Cohort 2230.057 2643.344
Semantic 2222.176 2739.559
Lower 95 Percent Confidence Limits
Group
Condition YC OC
Cohort 2103.037 2516.026
Semantic 2088.161 2605.384
Upper 95 Percent Confidence Limits
Group
Condition YC OC
Cohort 2357.076 2770.662
Semantic 2356.190 2873.734
For the purposes of plotting, we want to convert the effect list object into a data frame. Conveniently, there is a as.data.frame() function:
> x <- as.data.frame(ef)
> x
Condition Group fit se lower upper
1 Cohort YC 2230.057 64.74945 2103.037 2357.076
2 Semantic YC 2222.176 68.31514 2088.161 2356.190
3 Cohort OC 2643.344 64.90167 2516.026 2770.662
4 Semantic OC 2739.559 68.39711 2605.384 2873.734
Now we can plot this:
> ggplot(x, aes(Condition, fit, color=Group)) + geom_point() + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4) + theme_bw(base_size=12)
Or for people who like dynamite plots:> ggplot(x, aes(Group, fit, color=Condition, fill=Condition)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4, position=position_dodge(width=0.9)) + theme_bw(base_size=12)
I've never used the effects package for this; thanks Dan. I was interested in how effect() would compare with predictSE() from the AICcmodavg package, which I've used for the same reasons you cite. I was happy to see that they matched.
ReplyDelete> mod <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
> as.data.frame(effect("Days", mod, xlevels = list(Days = seq(1,8,by=1))))
> as.data.frame(predictSE(mod, newdata = list(Days = seq(1,8,by=1))))
I like that effect() gives 95% CIs.
Hi, Dan
ReplyDeleteI would like to ask you a question related to this post. It's about what to do when the pattern of by subjecs RTs are different from the lsmeans of the LME model and this last pattern is what was expected.
I'm analysing inverse transformed RT data with LME.
My model is : model = invRT ~ A*B + C + (A | Subject) + (B | Item)
where A and B are categorical variables (treatment coded because I'm interested in the differences respect to a baseline) and C is a continuous covariate.
Let's say that the raw by subjects RT pattern of the condition A is as follows:
Level 1. 650 ms
Level 2. 625 ms
Level 3. 623 ms
Level 4. 624 ms
But the back-transformed lsmeans of the model are:
Level 1. 650 ms
Level 2. 622 ms
Level 3. 626 ms
Level 4. 627 ms
and the difference between levels 1 and 2 are significant (t > 2) whereas the difference 1 vs. 3 and 1 vs. 4 are not.
This is actually the result I was expecting. Would it be OK to report the back-transformed lsmeans as the RT pattern of the data?
Thanks in advance.
Yes, I think it is OK to report the condition RTs estimated from the model by lsmeans (or effect). In fact, I think it might be better than reporting the raw by-subjects RTs because the lsmeans result better reflects your analysis model, which is (presumably) the basis of your inferences/conclusions. Having been trained in the pre-mixed-effects-models days, I initially found plotting model-estimated RTs somewhat strange, but I found it useful to remember that by-subject condition means is also a model of the data, and it's *not* the model actually used to analyze the data.
DeleteAs a side note, be careful about drawing conclusions from the fact that the difference between levels 1 and 2 is significant, but the other differences are not. The difference between significant and not significant is not itself statistically significant (Gelman & Stern, 2006: http://dx.doi.org/10.1198/000313006X152649). Just looking at the condition means, it looks to me like levels 2-4 are pretty much the same and level 1 is slower. You can recode the contrasts to test level 1 against levels 2-4, though I don't know whether that would be informative for your hypothesis. In any case, I wouldn't buy a conclusion like "level 2 < level 1 = level 3 = level 4".
summary(glht(model, lsm(pairwise ~ A)))
Deletesummary(glht(model, lsm(pairwise ~ A)), test=adjusted("bonferroni))
Fit: lme4::lmer(formula = RTinv ~ A * B + C +
(A | subject) + (A | item), data = words,
REML = FALSE, control = lmerControl(optimizer = "bobyqa"),
subset = abs(scale(resid(words_1))) < 2.5)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
Level1 - Level2 == 0 0.035550 0.014152 2.512 0.0578 .
(Level1 - Level2 == 0 0.035550 0.014152 2.512 0.072 (test=adjusted("bonferroni"))
Level1 - Level3 == 0 0.028610 0.013881 2.061 0.1656
Level1 - Level4 == 0 0.020807 0.015122 1.376 0.5139
Level2 - Level3 == 0 -0.006940 0.013601 -0.510 0.9566
Level2 - Level4 == 0 -0.014743 0.014379 -1.025 0.7341
Level3 - Level4 == 0 -0.007803 0.013767 -0.567 0.9418
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
Thanks, Dan!
DeleteHow did you get the back transformed lsmeans ? Is there a script for that? Did you use partial derivatives?
ReplyDeleteWhat I did is just:
Deletelibrary(effects)
-1000/effect("condition", model.1)$fit
But see Lo & Andrews (2015) and the supplementary material http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01171/full
and you'll find this:
The following R syntax was used to code the specific form of the inverse link function (-1000/μ ̂), which was called when generating models assuming an inverse relationship between the predictors and RT. (Lo & Andrews, 2015)
invfn <- function() {
## link
linkfun <- function(y) -1000/y
## inverse link
linkinv <- function(eta) -1000/eta
## derivative of invlink wrt eta
mu.eta <- function(eta) { 1000/(eta^2) }
valideta <- function(eta) TRUE
link <- "-1000/y"
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
}
Just to check you can run a model like:
model.1<-glmer(RT ~ condition +(1|subject) +(1|item), data=DATA, family=inverse.gaussian(link=invfn()))
And check that the output is in RTinv scale.
summary(model.1)
Then:
effect("condition", model.1)
and you'll see that the output is automatically back-transformed in RTs.
When you do
effect("condition", model.1)$fit
you get the coefficients in RTinv scale.
Then
-1000/effect("condition", model.1)$fit
and you'll see that the results are the same.
(This is what I did, but may be I'm doing something wrong...)
Hi, your post is really useful. I used the effect function on some data from a research project of mine and found that the standard error values are always the same for each factor level (see output below). My data set is from an experiment where replication is balanced, and if I delete a few observations to make it unbalanced then it changes the standard errors and they are not all the same. How are the standard errors calculated and why are they equivalent when the data set is balanced? It seems that the exact same information is going into calculating the values for each factor level, but why would that be? If I calculate the SEs for the different groups using the raw data, obviously the SEs are very different from group to group.
ReplyDeleteDepth fit se lower upper
1 10-20cm 6528.954 1557.851 3359.482 9698.425
2 30-40cm 5466.135 1557.851 2296.664 8635.607
There are some good discussions online about how to compute confidence intervals (or SE) for mixed-effects models. I recommend a recent one on ling-r-lang-lang (https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2017-March/000914.html) and a slightly older one on CrossValidated (http://stats.stackexchange.com/questions/117641/how-trustworthy-are-the-confidence-intervals-for-lmer-objects-through-effects-pa).
DeleteHi Dan,
ReplyDeleteThank you for this tutorial.
I tried your method and it works great on my lmm for small models. However in an extended model I get the error that the specified effect does not appear in the model", wehen applying the effect() function. Have you encountered this bug?
I don't think I've gotten that error. At least no including times when I make a typo or something and the (incorrectly) specified effect actually does not appear in the model.
DeleteI can get effects to work when I run a model with lm(), but not with lmer(). I get the error "Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class ‘"function"’ to a data.frame" Has anyone else run into that problem?
ReplyDeleteSorry for slow reply. I've also started getting this error -- it looks like something has changed in more recent versions of effects that broke support for lmer, but I'm not sure what or why. If I figure it out, I'll post an update.
Deletehi, is there a way to plot the odd ratios from the glmer output using the example above (instead of log odd-ratios)? Thank you!
ReplyDeleteI think for binomial models the effect() function returns proportions (the fitted() function does also), not log-odds. That works out pretty well for things that I deal with (accuracy, fixation proportions) btu I guess it might not be convenient in contexts where people like to see odds ratios.
Delete