As separate by-subjects and by-items analyses have been replaced by mixed-effects models with crossed random effects of subjects and items, I've often found myself wondering about the best way to plot data. The simple-minded means and SE from trial-level data will be inaccurate because they won't take the nesting into account. If I compute subject means and plot those with by-subject SE, then I'm plotting something different from what I analyzed, which is not always terrible, but definitely not ideal. It seems intuitive that the condition means and SE's are computable from the model's parameter estimates, but that computation is not trivial, particularly when you're dealing with interactions. Or, rather, that computation was not trivial until I discovered the

To show how this would work, I pillaged some data from a word-to-picture matching pilot study. Younger adults (college students) and older adults (mostly 50s and 60s) did a word-to-picture matching task in the presence of either cohort competitors (camera - camel) or semantic competitors (lion - tiger).

> summary(RT.demo)

Subject Target Condition ACC RT Group

2102 : 40 chicken: 36 Cohort :690 Min. :1 Min. :1503 YC:701

2103 : 40 hat : 36 Semantic:675 1st Qu.:1 1st Qu.:2131 OC:664

2104 : 40 penny : 36 Median :1 Median :2362

2106 : 40 potato : 36 Mean :1 Mean :2442

2109 : 40 radio : 36 3rd Qu.:1 3rd Qu.:2684

2116 : 40 stool : 36 Max. :1 Max. :4847

(Other):1125 (Other):1149

> ggplot(x, aes(Group, fit, color=Condition, fill=Condition)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4, position=position_dodge(width=0.9)) + theme_bw(base_size=12)

**effects**package.To show how this would work, I pillaged some data from a word-to-picture matching pilot study. Younger adults (college students) and older adults (mostly 50s and 60s) did a word-to-picture matching task in the presence of either cohort competitors (camera - camel) or semantic competitors (lion - tiger).

> summary(RT.demo)

Subject Target Condition ACC RT Group

2102 : 40 chicken: 36 Cohort :690 Min. :1 Min. :1503 YC:701

2103 : 40 hat : 36 Semantic:675 1st Qu.:1 1st Qu.:2131 OC:664

2104 : 40 penny : 36 Median :1 Median :2362

2106 : 40 potato : 36 Mean :1 Mean :2442

2109 : 40 radio : 36 3rd Qu.:1 3rd Qu.:2684

2116 : 40 stool : 36 Max. :1 Max. :4847

(Other):1125 (Other):1149

> ggplot(RT.demo, aes(Condition, RT, fill=Group, color=Group)) + geom_violin() + theme_bw(base_size=12)

Not surprisingly, the response times for older adults are slower than for younger adults, but it looks like this might be particularly true in the presence of semantic competitors. Let's test that with a mixed model with crossed random effects of subjects and items.

> m <- lmer(RT ~ Condition*Group + (Condition | Subject) + (1 | Target), data=RT.demo)

> coef(summary(m))

Estimate Std. Error t value

(Intercept) 2230.057 64.749 34.44

ConditionSemantic -7.881 68.565 -0.11

GroupOC 413.287 65.097 6.35

ConditionSemantic:GroupOC 104.096 33.110 3.14

So it looks like the older adults are about 400ms slower than the younger adults in the cohort condition and another 100ms slower in the semantic condition. Now we can use the

**effects**package to convert these parameter estimates into condition mean and SE estimates. The key function is effect(), which takes a term from the model and the model object. We can use summary() on the effect list object to get the information we need.
> library(effects)

> ef <- effect("Condition:Group", m)

> summary(ef)

Condition*Group effect

Group

Condition YC OC

Cohort 2230.057 2643.344

Semantic 2222.176 2739.559

Lower 95 Percent Confidence Limits

Group

Condition YC OC

Cohort 2103.037 2516.026

Semantic 2088.161 2605.384

Upper 95 Percent Confidence Limits

Group

Condition YC OC

Cohort 2357.076 2770.662

Semantic 2356.190 2873.734

For the purposes of plotting, we want to convert the effect list object into a data frame. Conveniently, there is a as.data.frame() function:

> x <- as.data.frame(ef)

> x

Condition Group fit se lower upper

1 Cohort YC 2230.057 64.74945 2103.037 2357.076

2 Semantic YC 2222.176 68.31514 2088.161 2356.190

3 Cohort OC 2643.344 64.90167 2516.026 2770.662

4 Semantic OC 2739.559 68.39711 2605.384 2873.734

Now we can plot this:

> ggplot(x, aes(Condition, fit, color=Group)) + geom_point() + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4) + theme_bw(base_size=12)

Or for people who like dynamite plots:> ggplot(x, aes(Group, fit, color=Condition, fill=Condition)) + geom_bar(stat="identity", position="dodge") + geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=0.4, position=position_dodge(width=0.9)) + theme_bw(base_size=12)

I've never used the effects package for this; thanks Dan. I was interested in how effect() would compare with predictSE() from the AICcmodavg package, which I've used for the same reasons you cite. I was happy to see that they matched.

ReplyDelete> mod <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)

> as.data.frame(effect("Days", mod, xlevels = list(Days = seq(1,8,by=1))))

> as.data.frame(predictSE(mod, newdata = list(Days = seq(1,8,by=1))))

I like that effect() gives 95% CIs.

Hi, Dan

ReplyDeleteI would like to ask you a question related to this post. It's about what to do when the pattern of by subjecs RTs are different from the lsmeans of the LME model and this last pattern is what was expected.

I'm analysing inverse transformed RT data with LME.

My model is : model = invRT ~ A*B + C + (A | Subject) + (B | Item)

where A and B are categorical variables (treatment coded because I'm interested in the differences respect to a baseline) and C is a continuous covariate.

Let's say that the raw by subjects RT pattern of the condition A is as follows:

Level 1. 650 ms

Level 2. 625 ms

Level 3. 623 ms

Level 4. 624 ms

But the back-transformed lsmeans of the model are:

Level 1. 650 ms

Level 2. 622 ms

Level 3. 626 ms

Level 4. 627 ms

and the difference between levels 1 and 2 are significant (t > 2) whereas the difference 1 vs. 3 and 1 vs. 4 are not.

This is actually the result I was expecting. Would it be OK to report the back-transformed lsmeans as the RT pattern of the data?

Thanks in advance.

Yes, I think it is OK to report the condition RTs estimated from the model by lsmeans (or effect). In fact, I think it might be better than reporting the raw by-subjects RTs because the lsmeans result better reflects your analysis model, which is (presumably) the basis of your inferences/conclusions. Having been trained in the pre-mixed-effects-models days, I initially found plotting model-estimated RTs somewhat strange, but I found it useful to remember that by-subject condition means is also a model of the data, and it's *not* the model actually used to analyze the data.

DeleteAs a side note, be careful about drawing conclusions from the fact that the difference between levels 1 and 2 is significant, but the other differences are not. The difference between significant and not significant is not itself statistically significant (Gelman & Stern, 2006: http://dx.doi.org/10.1198/000313006X152649). Just looking at the condition means, it looks to me like levels 2-4 are pretty much the same and level 1 is slower. You can recode the contrasts to test level 1 against levels 2-4, though I don't know whether that would be informative for your hypothesis. In any case, I wouldn't buy a conclusion like "level 2 < level 1 = level 3 = level 4".

summary(glht(model, lsm(pairwise ~ A)))

Deletesummary(glht(model, lsm(pairwise ~ A)), test=adjusted("bonferroni))

Fit: lme4::lmer(formula = RTinv ~ A * B + C +

(A | subject) + (A | item), data = words,

REML = FALSE, control = lmerControl(optimizer = "bobyqa"),

subset = abs(scale(resid(words_1))) < 2.5)

Linear Hypotheses:

Estimate Std. Error z value Pr(>|z|)

Level1 - Level2 == 0 0.035550 0.014152 2.512 0.0578 .

(Level1 - Level2 == 0 0.035550 0.014152 2.512 0.072 (test=adjusted("bonferroni"))

Level1 - Level3 == 0 0.028610 0.013881 2.061 0.1656

Level1 - Level4 == 0 0.020807 0.015122 1.376 0.5139

Level2 - Level3 == 0 -0.006940 0.013601 -0.510 0.9566

Level2 - Level4 == 0 -0.014743 0.014379 -1.025 0.7341

Level3 - Level4 == 0 -0.007803 0.013767 -0.567 0.9418

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Adjusted p values reported -- single-step method)

Thanks, Dan!

DeleteHow did you get the back transformed lsmeans ? Is there a script for that? Did you use partial derivatives?

ReplyDeleteWhat I did is just:

Deletelibrary(effects)

-1000/effect("condition", model.1)$fit

But see Lo & Andrews (2015) and the supplementary material http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01171/full

and you'll find this:

The following R syntax was used to code the specific form of the inverse link function (-1000/μ ̂), which was called when generating models assuming an inverse relationship between the predictors and RT. (Lo & Andrews, 2015)

invfn <- function() {

## link

linkfun <- function(y) -1000/y

## inverse link

linkinv <- function(eta) -1000/eta

## derivative of invlink wrt eta

mu.eta <- function(eta) { 1000/(eta^2) }

valideta <- function(eta) TRUE

link <- "-1000/y"

structure(list(linkfun = linkfun, linkinv = linkinv,

mu.eta = mu.eta, valideta = valideta,

name = link),

class = "link-glm")

}

Just to check you can run a model like:

model.1<-glmer(RT ~ condition +(1|subject) +(1|item), data=DATA, family=inverse.gaussian(link=invfn()))

And check that the output is in RTinv scale.

summary(model.1)

Then:

effect("condition", model.1)

and you'll see that the output is automatically back-transformed in RTs.

When you do

effect("condition", model.1)$fit

you get the coefficients in RTinv scale.

Then

-1000/effect("condition", model.1)$fit

and you'll see that the results are the same.

(This is what I did, but may be I'm doing something wrong...)

Hi, your post is really useful. I used the effect function on some data from a research project of mine and found that the standard error values are always the same for each factor level (see output below). My data set is from an experiment where replication is balanced, and if I delete a few observations to make it unbalanced then it changes the standard errors and they are not all the same. How are the standard errors calculated and why are they equivalent when the data set is balanced? It seems that the exact same information is going into calculating the values for each factor level, but why would that be? If I calculate the SEs for the different groups using the raw data, obviously the SEs are very different from group to group.

ReplyDeleteDepth fit se lower upper

1 10-20cm 6528.954 1557.851 3359.482 9698.425

2 30-40cm 5466.135 1557.851 2296.664 8635.607

There are some good discussions online about how to compute confidence intervals (or SE) for mixed-effects models. I recommend a recent one on ling-r-lang-lang (https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2017-March/000914.html) and a slightly older one on CrossValidated (http://stats.stackexchange.com/questions/117641/how-trustworthy-are-the-confidence-intervals-for-lmer-objects-through-effects-pa).

DeleteHi Dan,

ReplyDeleteThank you for this tutorial.

I tried your method and it works great on my lmm for small models. However in an extended model I get the error that the specified effect does not appear in the model", wehen applying the effect() function. Have you encountered this bug?

I don't think I've gotten that error. At least no including times when I make a typo or something and the (incorrectly) specified effect actually does not appear in the model.

DeleteI can get effects to work when I run a model with lm(), but not with lmer(). I get the error "Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class ‘"function"’ to a data.frame" Has anyone else run into that problem?

ReplyDelete