Minding the Brain: computational modeling

Showing posts with label computational modeling. Show all posts

Thursday, February 4, 2016

15th Neural Computation and Psychology Workshop

NCPW15 – August 8-9, 2016 – Philadelphia, PA, USA

15^th Neural Computation and Psychology Workshop

Contemporary Neural Network Models:
Machine Learning, Artificial Intelligence, and Cognition

When lexical competition becomes lexical cooperation

Lexical neighborhood effects are one of the most robust findings in spoken word recognition: words with many similar-sounding words ("neighbors") are recognized more slowly and less accurately than words with few neighbors. About 10 years ago, when I was just starting my post-doc training with Jim Magnuson, we wondered about semantic neighborhood effects. We found that things were less straightforward in semantics: near semantic neighbors slowed down visual word recognition, but distant semantic neighbors sped up visual word recognition (Mirman & Magnuson, 2008). I later found that the same pattern in spoken word production (Mirman, 2011). Working with Whit Tabor, we developed a preliminary computational account. Later, when Qi Chen joined my lab at MRRI, we expanded this computational model to capture orthographic, phonological, and semantic neighborhood density effects in visual and spoken word recognition and spoken word production (Chen & Mirman, 2012). The key insight from our model was that neighbors exert both inhibitory and facilitative effects on target word processing with the inhibitory effect dominating for strongly active neighbors and the facilitative effect dominating for weakly active neighbors.

In a new paper soon to be published in Cognitive Science (Chen & Mirman, in press) we test a unique prediction from our model. The idea is that phonological neighborhood effects in spoken word recognition are so robust because phonological neighbors are consistently strongly activated during spoken word recognition. If we can reduce their activation by creating a context in which they are not among the likely targets, then their inhibitory effect will not just get smaller, it will become smaller than the facilitative effect, so the net result will be a flip to a facilitative effect. We tested this by using spoken word-to-picture matching with eye-tracking, more commonly known as the "visual world paradigm". When four (phonologically unrelated) pictures appear on the screen, they provide some semantic information about the likely target word. The longer they are on-screen before the spoken word begins, the more this semantic context will influence which lexical candidates will be activated. At one extreme, without any semantic context, we should see the standard inhibitory effect of phonological neighbors; at the other extreme, if only the pictured items are viable candidates, there should be no effect of phonological neighbors. Here is the cool part (if I may say so): at an intermediate point, the semantic context reduces phonological neighbor activation but doesn't eliminate it, so the neighbors will be weakly active and will produce a facilitative effect.

We report simulations of our model concretely demonstrating this prediction and an experiment in which we manipulate the preview duration (how long the pictures are displayed before the spoken word starts) as a way of manipulating the strength of semantic context. The results were (mostly) consistent with this prediction.

At 500ms preview (middle panel), there is a clear facilitative effect of neighborhood density: the target fixation proportions for high density targets (red line) rise faster than for the low density targets (blue line). This did not happen with either the shorter or longer preview duration and is not expected unless the preview provides semantic input that weakens activation of phonological neighbors, thus making their net effect facilitative rather than inhibitory.

I'm excited about this paper because "lexical competition" is such a core concept in spoken word recognition that it is hard to imagine neighborhood density having a facilitative effect, but that's what our model predicted and the eye-tracking results bore it out. This is one of those full-cycle cases where behavioral data led to a theory, which led to a computational model, which made new predictions, which were tested in a behavioral experiment. That's what I was trained to do and it feels good to have actually pulled it off.

As a final meta comment: we owe a big "Thank You" to Keith Apfelbaum, Sheila Blumstein, and Bob McMurray, whose 2011 paper was part of the inspiration for this study. Even more importantly, Keith and Bob shared first their data for our follow-up analyses, then their study materials to help us run our experiment. I think this kind of sharing is hugely important for having a science that truly builds and moves forward in a replicable way, but it is all too rare. Apfelbaum, Blumstein, and McMurray not only ran a good study, they also helped other people build on it, which multiplied their positive contribution to the field. I hope one day we can make this kind of sharing the standard in the field, but until then, I'll just appreciate the people who do it.

Apfelbaum K S, Blumstein S E, & McMurray B (2011). Semantic priming is affected by real-time phonological competition: evidence for continuous cascading systems. Psychonomic Bulletin & Review, 18 (1), 141-149 PMID: 21327343
Chen Q, & Mirman D (2012). Competition and cooperation among similar representations: toward a unified account of facilitative and inhibitory effects of lexical neighbors. Psychological Review, 119 (2), 417-430 PMID: 22352357
Chen Q, & Mirman D (2015). Interaction Between Phonological and Semantic Representations: Time Matters. Cognitive Science (in press) PMID: 25155249
Mirman D (2011). Effects of near and distant semantic neighbors on word production. Cognitive, Affective & Behavioral Neuroscience, 11 (1), 32-43 PMID: 21264640
Mirman D, & Magnuson J S (2008). Attractor dynamics and semantic neighborhood density: processing is slowed by near neighbors and speeded by distant neighbors. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (1), 65-79 PMID: 18194055

Monday, October 21, 2013

The mind is not a (digital) computer

The "mind as computer" has been a dominant and powerful metaphor in cognitive science at least since the middle of the 20th century. Throughout this time, many of us have chafed against this metaphor because it has a tendency to be taken too literally. Framing mental and neural processes in terms of computation or information processing can be extremely useful, but this approach can turn into the extremely misleading notion that our minds work kind of like our desktop or laptop computers. There are two particular notions that have continued to hold sway despite mountains of evidence against them and I think their perseverance might be, at least in part, due to the computer analogy.

The first is modularity or autonomy: the idea that the mind/brain is made up of (semi-)independent components. Decades of research on interactive processing (including my own) and emergence have shown that this is not the case (e.g., McClelland, Mirman, & Holt, 2006; McClelland, 2010; Dixon, Holden, Mirman, & Stephen, 2012), but components remain a key part of the default description of cognitive systems, perhaps with some caveat that these components interact.

The second is the idea that the mind engages in symbolic or rule-based computation, much like the if-then procedures that form the core of computer programs. This idea is widely associated with the popular science writing of Steven Pinker and is a central feature of classic models of cognition, such as ACT-R. In a new paper just published in the journal Cognition, Gary Lupyan reports 13 experiments showing just how bad human minds are at executing simple rule-based algorithms (full disclosure: Gary and I are friends and have collaborated on a few projects). In particular, he tested parity judgments (is a number odd or even?), triangle judgments (is a figure a triangle?), and grandmother judgments (is a person a grandmother?). Each of these is a simple, rule-based judgment, and the participants knew the rule (last digit is even; polygon with three sides; has at least one grandchild), but they were nevertheless biased by typicality: numbers with more even digits were judged to be more even, equilateral triangles were judged to be more triangular, and older women with more grandchildren were judged to be more grandmotherly. A variety of control conditions and experiments ruled out various alternative explanations of these results. The bottom line is that, as he puts it, "human algorithms, unlike conventional computer algorithms, only approximate rule-based classification and never fully abstract from the specifics of the input."

It's probably too much to hope that this paper will end the misuse of the computer metaphor, but I think it will be a nice reminder of the limitations of this metaphor.

Dixon JA, Holden JG, Mirman D, & Stephen DG (2012). Multifractal dynamics in the emergence of cognitive structure. Topics in Cognitive Science, 4 (1), 51-62 PMID: 22253177
Lupyan, G. (2013). The difficulties of executing simple algorithms: Why brains make mistakes computers don’t. Cognition, 129(3), 615-636. DOI: 10.1016/j.cognition.2013.08.015
McClelland, J.L. (2010). Emergence in Cognitive Science. Topics in Cognitive Science, 2 (4), 751-770 DOI: 10.1111/j.1756-8765.2010.01116.x
McClelland JL, Mirman D, & Holt LL (2006). Are there interactive processes in speech perception? Trends in Cognitive Sciences, 10 (8), 363-369 PMID: 16843037

Monday, June 17, 2013

Models are experiments

I spent last week at a two-part meeting on language in developmental and acquired disorders, hosted by the Royal Society. The organizers (Dorothy Bishop, Kate Nation, and Karalyn Patterson) devised a meeting structure that stimulated – and made room for – a lot of discussion and one of the major discussion topics throughout the meeting was computational modeling. A major highlight for me was David Plaut’s aphorism “Models are experiments”. The idea is that models are sometimes taken to be the theory, but they are better thought of as experiments designed to test the theory. In other words, just as a theory predicts some behavioral phenomena, it also predicts that a model implementation of that theory should exhibit those phenomena. This point of view has several important, and I think useful, consequences.

Statistical models vs. cognitive models

My undergraduate and graduate training in psychology and cognitive neuroscience focused on computational modeling and behavioral experimentation: implementing concrete models to test cognitive theories by simulation and evaluating predictions from those models with behavioral experiments. During this time, good ol’ t-test was enough statistics for me. I continued this sort of work during my post-doctoral fellowship, but as I became more interested in studying the time course of cognitive processing, I had to learn about statistical modeling, specifically, growth curve analysis (multilevel regression) for time series data. These two kinds of modeling – computational/cognitive and statistical – are often conflated, but I believe they are very different and serve complementary purposes in cognitive science and cognitive neuroscience.

It will help to have some examples of what I mean when I say that statistical and cognitive models are sometimes conflated. I have found that computational modeling talks sometimes provoke a certain kind of skeptic to ask “With a sufficient number of free parameters it is possible to fit any data set, so how many parameters does your model have?” The first part of that question is true in a strictly mathematical sense: for example, a Taylor series polynomial can be used to approximate any function with arbitrary precision. But this is not how cognitive modeling works. Cognitive models are meant to implement theoretical principles, not arbitrary mathematical functions, and although they always have some flexible parameters, these parameters are not “free” in the way that the coefficients of a Taylor series are free.

On the other hand, when analyzing behavioral data, it can be tempting to use a statistical model with parameters that map in some simple way onto theoretical constructs. For example, assuming Weber’s Law holds (a power law relationship between physical stimulus magnitude and perceived intensity), one can collect data in some domain of interest, fit a power law function, and compute the Weber constant for that domain. However, if you happen to be studying a domain where Weber’s law does not quite hold, your Weber constant will not be very informative.

In other words, statistical and computational models have different, complementary goals. The point of statistical models is to describe or quantify the observed data. This is immensely useful because extracting key effects or patterns allows us to talk about large data sets in terms of a small number of “effects” or differences between conditions. Such descriptions are best when they focus on the data themselves and are independent of any particular theory – this allows researchers to evaluate any and all theories against the data. Statistical models need to worry about number of free parameters and this is captured by standard goodness-of-fit statistics such as AIC, BIC, and log-likelihood.

In contrast, cognitive models are meant to test a specific theory, so fidelity to the theory is more important than counting the number of parameters. Ideally, the cognitive model’s output can be compared directly to the observed behavioral data, using more or less the same model comparison techniques (R-squared, log-likelihood, etc.). However, because cognitive models are usually simplified, that kind of quantitative fit is not always possible (or even advisable) and a qualitative comparison of model and behavioral data must suffice. This qualitative comparison critically depends on an accurate – and theory-neutral – description of the behavioral data, which is provided by the statistical model. (A nice summary of different methods of evaluating computational models against behavioral data is provided by Pitt et al.,2006).

Jim Magnuson, J. Dixon, and I advocated this kind of two-pronged approach – using statistical models to describe the data and computational models to evaluate theories – when we adapted growth curve analysis to eye-tracking data (Mirman etal., 2008). Then, working with Eiling Yee and Sheila Blumstein, we used this approach to study phonological competition in spoken word recognition in aphasia (Mirman etal., 2011). To my mind, this is the optimal way to simultaneously maximize accurate description of behavioral data and theoretical impact of the research.

Minding the Brain