Wednesday, August 13, 2014

Plotting mixed-effects model results with effects package

As separate by-subjects and by-items analyses have been replaced by mixed-effects models with crossed random effects of subjects and items, I've often found myself wondering about the best way to plot data. The simple-minded means and SE from trial-level data will be inaccurate because they won't take the nesting into account. If I compute subject means and plot those with by-subject SE, then I'm plotting something different from what I analyzed, which is not always terrible, but definitely not ideal. It seems intuitive that the condition means and SE's are computable from the model's parameter estimates, but that computation is not trivial, particularly when you're dealing with interactions. Or, rather, that computation was not trivial until I discovered the effects package.

Tuesday, August 5, 2014

Visualizing Components of Growth Curve Analysis

This is a guest post by Matthew Winn:

One of the more useful skills I’ve learned in the past couple years is growth curve analysis (GCA), which helps me analyze eye-tracking data and other kinds of data that take a functional form. Like some other advanced statistical techniques, it is a procedure that can be done without complete understanding, and is likely to demand more than one explanation before you really “get it”. In this post, I will illustrate the way that I think about it, in hopes that it can “click” for some more people. The objective is to break down a complex curve into individual components.

Friday, April 4, 2014

Flip the script, or, the joys of coord_flip()

Has this ever happened to you?

I hate it when the labels on the x-axis overlap, but this can be hard to avoid. I can stretch the figure out, but then the data become farther apart and the space where I want to put the figure (either in a talk or a paper) may not accommodate that. I've never liked turning the labels diagonally, so recently I've started using coord_flip() to switch the x- and y-axes:
ggplot(chickwts, aes(feed, weight)) + stat_summary(fun.data=mean_se, geom="pointrange") + coord_flip()

It took a little getting used to, but I think this works well. It's especially good for factor analyses (where you have many labeled items):
library(psych)
pc <- principal(Harman74.cor$cov, 4, rotate="varimax")
loadings <- as.data.frame(pc$loadings[, 1:ncol(pc$loadings)])
loadings$Test <- rownames(loadings)

ggplot(loadings, aes(Test, RC1)) + geom_bar() + coord_flip() + theme_bw(base_size=10)
It also works well if you want to plot parameter estimates from a regression model (where the parameter names can get long):
library(lme4)
m <- lmer(weight ~ Time * Diet + (Time | Chick), data=ChickWeight, REML=F)
coefs <- as.data.frame(coef(summary(m)))
colnames(coefs) <- c("Estimate", "SE", "tval")
coefs$Label <- rownames(coefs)


ggplot(coefs, aes(Label, Estimate)) + geom_pointrange(aes(ymin = Estimate - SE, ymax = Estimate + SE)) + geom_hline(yintercept=0) + coord_flip() + theme_bw(base_size=10)

Monday, March 3, 2014

Guidebook for growth curve analysis

I don't usually like to use complex statistical methods, but every once in a while I encounter a method that is so useful that I can't avoid using it. Around the time I started doing eye-tracking research (as a post-doc with Jim Magnuson), people were starting recognize the value of using longitudinal data analysis techniques to analyze fixation time course data. Jim was ahead of most in this regard (Magnuson et al., 2007) and a special issue of the Journal of Memory and Language on data analysis methods gave as a great opportunity to describe how to apply "Growth Curve Analysis" (GCA) - a type of multilevel regression - to fixation time course data (Mirman, Dixon, & Magnuson, 2008). Unbeknownst to us, Dale Barr was working on very similar methods, though for somewhat different reasons, and our articles ended up neighbors in the special issue (Barr, 2008).

Growth Curve Analysis and Visualization Using R
In the several years since those papers came out, it has become clear to me that other researchers would like to use GCA, but reading our paper and downloading our code examples was often not enough for them to be able to apply GCA to their own data. There are excellent multilevel regression textbooks out there, but I think it is safe to say that it's a rare cognitive or behavioral scientist who has the time and inclination to work through a 600-page advanced regression textbook. It seemed like a more practical guidebook to implementing GCA was needed, so I wrote one and it has just been published by Chapman & Hall / CRC Press as part of their R Series.

My idea was to write a relatively easy-to-understand book that dealt with the practical issues of implementing GCA using R. I assumed basic knowledge of behavioral statistics (standard coursework in graduate behavioral science programs) and minimal familiarity with R, but no expertise in computer programming or the specific R packages required for implementation (primarily lme4 and ggplot2). In addition to the core issues of fitting growth curve models and interpreting the results, the book covers plotting time course data and model fits and analyzing individual differences. Example data sets and solutions to the exercises in the book are available on my GCA website.

Obviously, the main point of this book is to help other cognitive and behavioral scientists to use GCA, but I hope it will also encourage them to make better graphs and to analyze individual differences. I think individual differences are very important to cognitive science, but most statistical methods treat them as just noise, so maybe having better methods will lead to better science, though this might be a subject for a different post. Comments and feedback about the book are, of course, most welcome.  

Tuesday, February 11, 2014

Three ways to get parameter-specific p-values from lmer


How to get parameter-specific p-values is one of the most commonly asked questions about multilevel regression. The key issue is that the degrees of freedom are not trivial to compute for multilevel regression. Various detailed discussions can be found on the R-wiki and R-help mailing list post by Doug Bates. I have experimented with three methods that I think are reasonable.

Monday, January 27, 2014

Graduate school advice

Since it is the season for graduate school recruitment interviews, I thought I would share some of my thoughts. This is also partly prompted by two recent articles in the journal Neuron. If you're unfamiliar with it, Neuron is a very high-profile neuroscience journal, so the advice is aimed at graduate students in neuroscience, though I think the advice broadly applies to students in the cognitive sciences (and perhaps other sciences as well). The first of these articles deals with what makes a good graduate mentor and how to pick a graduate advisor; the second article has some good advice on how to be a good graduate advisee.

I broadly agree with the advice in those articles and here are a few things I would add:

Monday, December 9, 2013

Language in developmental and acquired disorders

As I mentioned in an earlier post, last June I had the great pleasure and honor of participating in a discussion meeting on Language in Developmental and Acquired Disorders hosted by the Royal Society and organized by Dorothy Bishop, Kate Nation, and Karalyn Patterson. Among the many wonderful things about this meeting was that it brought together people who study similar kinds of language deficit issues but in very different populations -- children with developmental language deficits such as dyslexia and older adults with acquired language deficits such as aphasia. Today, the special issue of Philosophical Transactions of the Royal Society B: Biological Sciences containing articles written by the meeting's speakers was published online (Table of Contents).