Minding the Brain: data analysis

Showing posts with label data analysis. Show all posts

Monday, April 16, 2018

Correcting for multiple comparisons in lesion-symptom mapping

We recently wrote a paper about correcting for multiple comparisons in voxel-based lesion-symptom mapping (Mirman et al., in press). Two methods did not perform very well: (1) setting a minimum cluster size based on permutations produced too much spillover beyond the true region, (2) false discovery rate (FDR) correction produced anti-conservative results for smaller sample sizes (N = 30–60). We developed an alternative solution by generalizing the standard permutation-based family-wise error correction approach, which provides a principled way to balance false positives and false negatives.

For that paper, we focused on standard "mass univariate" VLSM where the multiple comparisons are a clear problem. The multiple comparisons problem plays out differently in multivariate lesion-symptom mapping methods such as support vector regression LSM (SVR-LSM; Zhang et al., 2014, a slightly updated version is available from our github repo). Multivariate LSM methods consider all voxels simultaneously and there is not a simple relationship between voxel-level test statistics and p-values. In SVR-LSM, the voxel-level statistic is a SVR beta value and the p-values for those betas are calculated by permutation. I've been trying to work out how to deal with multiple comparisons in SVR-LSM.

Growth curve analysis workshop slides

Earlier this month I taught a two-day workshop on growth curve analysis at Georg-Elias-Müller Institute for Psychology in Göttingen, Germany. The purpose of the workshop was to provide a hands-on introduction to using GCA to analyze longitudinal or time course data, with a particular focus on eye-tracking data. All of the materials for the workshop are now available online (http://dmirman.github.io/GCA2018.html), including slides, examples, exercises, and exercise solutions. In addition to standard packages (ggplot2, lme4, etc.), we used my psy811 package for example data sets and helper functions.

Monday, June 8, 2015

A little growth curve analysis Q&A

I had an email exchange with Jeff Malins, who asked several questions about growth curve analysis. I often get questions of this sort and Jeff agreed to let me post excerpts from our (email) conversation. The following has been lightly edited for clarity and to be more concise.

Plotting mixed-effects model results with effects package

As separate by-subjects and by-items analyses have been replaced by mixed-effects models with crossed random effects of subjects and items, I've often found myself wondering about the best way to plot data. The simple-minded means and SE from trial-level data will be inaccurate because they won't take the nesting into account. If I compute subject means and plot those with by-subject SE, then I'm plotting something different from what I analyzed, which is not always terrible, but definitely not ideal. It seems intuitive that the condition means and SE's are computable from the model's parameter estimates, but that computation is not trivial, particularly when you're dealing with interactions. Or, rather, that computation was not trivial until I discovered the effects package.

Flip the script, or, the joys of coord_flip()

Has this ever happened to you?

I hate it when the labels on the x-axis overlap, but this can be hard to avoid. I can stretch the figure out, but then the data become farther apart and the space where I want to put the figure (either in a talk or a paper) may not accommodate that. I've never liked turning the labels diagonally, so recently I've started using coord_flip() to switch the x- and y-axes:
ggplot(chickwts, aes(feed, weight)) + stat_summary(fun.data=mean_se, geom="pointrange") + coord_flip()

It took a little getting used to, but I think this works well. It's especially good for factor analyses (where you have many labeled items):
library(psych)
pc <- principal(Harman74.cor$cov, 4, rotate="varimax")
loadings <- as.data.frame(pc$loadings[, 1:ncol(pc$loadings)])
loadings$Test <- rownames(loadings)

ggplot(loadings, aes(Test, RC1)) + geom_bar() + coord_flip() + theme_bw(base_size=10)

It also works well if you want to plot parameter estimates from a regression model (where the parameter names can get long):
library(lme4)
m <- lmer(weight ~ Time * Diet + (Time | Chick), data=ChickWeight, REML=F)
coefs <- as.data.frame(coef(summary(m)))
colnames(coefs) <- c("Estimate", "SE", "tval")
coefs$Label <- rownames(coefs)

ggplot(coefs, aes(Label, Estimate)) + geom_pointrange(aes(ymin = Estimate - SE, ymax = Estimate + SE)) + geom_hline(yintercept=0) + coord_flip() + theme_bw(base_size=10)

Tuesday, February 11, 2014

Three ways to get parameter-specific p-values from lmer

How to get parameter-specific p-values is one of the most commonly asked questions about multilevel regression. The key issue is that the degrees of freedom are not trivial to compute for multilevel regression. Various detailed discussions can be found on the R-wiki and R-help mailing list post by Doug Bates. I have experimented with three methods that I think are reasonable.

Aggregating data across trials of different durations

Note: This post is a summary of a more detailed technical report.

In a typical “visual world paradigm” (VWP) eye tracking study a trial ends when the participant responds, which naturally leads to some trials that are shorter than others. So we need to decide when computing fixation proportions at later time points, should terminated trials be included or not? Based on informal discussions with other VWP researchers, I think three approaches are currently in use: (1) for each time bin, include all trials and count post-response frames as non-object fixations (i.e., the participant is done fixating all objects from this trial), (2) include all trials and count post-response frames as target fixations (i.e., if the participant selected the correct object, then consider all subsequent fixations to be on that object; note that, typically, any trials on which the participant made an incorrect response are excluded from analysis), (3) include only trials that are currently on-going and ignore any terminated trials since there is no data for those trials.

The problem with the third approach is that it is a form of selection bias because trials do not terminate at random, so as the time series progresses through the time window, the data move further and further from the complete, unbiased set of trials to a biased subset of only trials that required additional processing time. This bias will operate both between conditions (i.e., more trials from a condition with difficult stimuli than from a condition with easy stimuli) and within conditions (i.e., more of the trials that were difficult than that were easy within a condition).

Here's an analogy to clarify this selection bias: imagine that we want to evaluate the response rate over time to a drug for a deadly disease. We enroll 100 participants in the trial and administer the drug. At first, only 50% of the participants respond to the drug. As the trial progresses, the non-responders begin to, unfortunately, die. After 6 months, only 75 participants are alive and participating in the trial and the same 50 are responding to the treatment. At this point, is the response rate the same 50% or has it risen to 67%? Would it be accurate to conclude that responsiveness to the treatment increases after 6 months?

Returning to eye-tracking data, the effect of this selection bias is to make differences appear more static. So, for target fixation data, you get the pattern below: considering only on-going makes it look like there is an asymptote difference between conditions, but "padding" the post-response frames with Target fixations correctly captures the processing speed difference. (These data are from a Monte Carlo simulation, so we know that the Target method is correct).

For competitor fixations, ignoring terminated trials makes the competition effects look longer-lasting, as in the figure on the left. These data come from our recent study of taxonomic and thematic semantic competition, so you can see the selection bias play out in real VWP data. We also randomly dropped 10% and 20% of the data points to show that the effect of ignoring terminated trials is not just a matter of having fewer data points.

Whether post-response data are considered "Target" or "Non-object" fixations does not seem to have biasing effects, though it does affect how the data do look in the same way that probability distribution curves and cumulative distribution curves show the same underlying data but in different ways. More details on all of this are available in our technical report.

Monday, August 6, 2012

Crawford-Howell (1998) t-test for case-control comparisons

Cognitive neuropsychologists (like me) often need to compare a single case to a small control group, but the standard two-sample t-test does not work for this because the case is only one observation. Several different approaches have been proposed and in a new paper just published in Cortex, Crawford and Garthwaite (2012) demonstrate that the Crawford-Howell (1998) t-test is a better approach (in terms of controlling Type I error rate) than other commonly-used alternatives. As I understand it, the core issue is that with a typical t-test, you're testing whether two means are different (or, for a one-sample t-test, whether one mean is different from some value), so the more observations you have, the better your estimate of the mean(s). In a case-control comparison you want to know how likely it is that the case value came from the distribution of the control data, so even if your control group is very large, the variability is still important -- knowing that your case is below the control mean is not enough, you want to know that it is below 95% (for example) of the controls. That is why, as Crawford and Garthwaite show, Type I error increases with control sample size for the other tests, but not for the Crawford-Howell test.

It is nice to have this method validated by Monte Carlo simulation and I intend to use it next time the need arises. I’ve put together a simple R implementation of it (it takes a single value as case and a vector of values for control and returns a data frame containing the t-value, degrees of freedom, and p-value):

CrawfordHowell <- function(case, control){
  tval <- (case - mean(control)) / (sd(control)*sqrt((length(control)+1) / length(control)))
  degfree <- length(control)-1
  pval <- 2*(1-pt(abs(tval), df=degfree)) #two-tailed p-value
  result <- data.frame(t = tval, df = degfree, p=pval)
  return(result)
}

Created by Pretty R at inside-R.org

Crawford, J.R., & Howell, D.C. (1998). Comparing an Individual’s Test Score Against Norms Derived from Small Samples. The Clinical Neuropsychologist, 12 (4), 482-486 DOI: 10.1076/clin.12.4.482.7241
Crawford, J. R., & Garthwaite, P. H. (2012). Single-case research in neuropsychology: A comparison of five forms of t-test for comparing a case to controls. Cortex, 48 (8), 1009-1016 DOI: 10.1016/j.cortex.2011.06.021

Thursday, August 2, 2012

Statistical models vs. cognitive models

My undergraduate and graduate training in psychology and cognitive neuroscience focused on computational modeling and behavioral experimentation: implementing concrete models to test cognitive theories by simulation and evaluating predictions from those models with behavioral experiments. During this time, good ol’ t-test was enough statistics for me. I continued this sort of work during my post-doctoral fellowship, but as I became more interested in studying the time course of cognitive processing, I had to learn about statistical modeling, specifically, growth curve analysis (multilevel regression) for time series data. These two kinds of modeling – computational/cognitive and statistical – are often conflated, but I believe they are very different and serve complementary purposes in cognitive science and cognitive neuroscience.

It will help to have some examples of what I mean when I say that statistical and cognitive models are sometimes conflated. I have found that computational modeling talks sometimes provoke a certain kind of skeptic to ask “With a sufficient number of free parameters it is possible to fit any data set, so how many parameters does your model have?” The first part of that question is true in a strictly mathematical sense: for example, a Taylor series polynomial can be used to approximate any function with arbitrary precision. But this is not how cognitive modeling works. Cognitive models are meant to implement theoretical principles, not arbitrary mathematical functions, and although they always have some flexible parameters, these parameters are not “free” in the way that the coefficients of a Taylor series are free.

On the other hand, when analyzing behavioral data, it can be tempting to use a statistical model with parameters that map in some simple way onto theoretical constructs. For example, assuming Weber’s Law holds (a power law relationship between physical stimulus magnitude and perceived intensity), one can collect data in some domain of interest, fit a power law function, and compute the Weber constant for that domain. However, if you happen to be studying a domain where Weber’s law does not quite hold, your Weber constant will not be very informative.

In other words, statistical and computational models have different, complementary goals. The point of statistical models is to describe or quantify the observed data. This is immensely useful because extracting key effects or patterns allows us to talk about large data sets in terms of a small number of “effects” or differences between conditions. Such descriptions are best when they focus on the data themselves and are independent of any particular theory – this allows researchers to evaluate any and all theories against the data. Statistical models need to worry about number of free parameters and this is captured by standard goodness-of-fit statistics such as AIC, BIC, and log-likelihood.

In contrast, cognitive models are meant to test a specific theory, so fidelity to the theory is more important than counting the number of parameters. Ideally, the cognitive model’s output can be compared directly to the observed behavioral data, using more or less the same model comparison techniques (R-squared, log-likelihood, etc.). However, because cognitive models are usually simplified, that kind of quantitative fit is not always possible (or even advisable) and a qualitative comparison of model and behavioral data must suffice. This qualitative comparison critically depends on an accurate – and theory-neutral – description of the behavioral data, which is provided by the statistical model. (A nice summary of different methods of evaluating computational models against behavioral data is provided by Pitt et al.,2006).

Jim Magnuson, J. Dixon, and I advocated this kind of two-pronged approach – using statistical models to describe the data and computational models to evaluate theories – when we adapted growth curve analysis to eye-tracking data (Mirman etal., 2008). Then, working with Eiling Yee and Sheila Blumstein, we used this approach to study phonological competition in spoken word recognition in aphasia (Mirman etal., 2011). To my mind, this is the optimal way to simultaneously maximize accurate description of behavioral data and theoretical impact of the research.

Minding the Brain