I was disappointed to read a (a draft of) a forthcoming APS Observer article by Susan Fiske in which she complains about how new media have allowed "unmoderated attacks" on individuals and their research programs. Other bloggers have written at some length about this (Andrew Gelman, Chris Chambers, Uri Simonsohn), I particularly recommend the longer and very thoughtful post by Tal Yarkoni. A few points have emerged as the most salient to me:
First, scientific criticism should be evaluated on its accuracy and constructiveness. Our goal should be accurate critiques that provide constructive ideas about how to do better. Efforts to improve the peer review process often focus on those factors, along with timeliness. As it happens, blogs are actually great for this: posts can be written quickly and immediately followed by comments that allow for back-and-forth so that any inaccuracies can be corrected and constructive ideas can emerge. Providing critiques in a polite way is a nice goal, but it is secondary. (Tal Yarkoni's post discusses this issue very well).
Second, APS is the publisher of Psychological Science, a journal that was once prominent and prestigious, but has gradually become a pop psychology punchline. Perhaps I should not have been surprised that they're publishing an unmoderated attack on new media.
Third, things have changed very rapidly (this is the main point of Andrew Gelman's post). When I was in graduate school (2000-2005), I don't remember hearing concerns about replication and standard operating procedures included lots of stuff that I would now consider "garden of forking paths"/"p-hacking". 2011 was a major turning point: Daryl Bem reported his evidence of ESP (side note: he was working on that since at least the mid-to-late 90's when I was undergrad at Cornell and heard him speak about it). At the time, the flaws in that paper were not at all clear. That was also the year a paper called “False-positive psychology” was published (in Psychological Science), which showed that “researcher degrees of freedom” (or "p-hacking") make actual false positive rates much higher than the nominal p < 0.05 values. The year after that, in 2012, Greg Francis's paper ("Too good to be true") came out showing that multi-experiment papers reporting consistent replications of small effect sizes are themselves very unlikely and may be reflecting selection bias, p-hacking, or other problems. 2012 also the year I was contacted by the Open Science Collaboration to contribute to their large-scale replication effort, which eventually led to a major report on the reproducibility of psychological research.
My point is that these issues, which are a huge deal now, were not very widely known even 5-6 years ago and almost nobody was talking about them 10 years ago. To put it another way, just about all tenured Psychology professors were trained before the term "p-hacking" even existed. So, maybe we should admit that all this rapid change can be a bit alarming and disorienting. But we're scientists, we're in the business of drawing conclusions from data, and the data clearly show that our old way of doing business has some flaws, so we should try to fix those flaws. Lots of good ideas are being implemented and tested -- transparency (sharing data and analysis code), post-publication peer review, new impact metrics for hiring/tenure/promotion that reward transparency and reproducibility. And many of those ideas came from those unmoderated new media discussions.
My point is that these issues, which are a huge deal now, were not very widely known even 5-6 years ago and almost nobody was talking about them 10 years ago. To put it another way, just about all tenured Psychology professors were trained before the term "p-hacking" even existed. So, maybe we should admit that all this rapid change can be a bit alarming and disorienting. But we're scientists, we're in the business of drawing conclusions from data, and the data clearly show that our old way of doing business has some flaws, so we should try to fix those flaws. Lots of good ideas are being implemented and tested -- transparency (sharing data and analysis code), post-publication peer review, new impact metrics for hiring/tenure/promotion that reward transparency and reproducibility. And many of those ideas came from those unmoderated new media discussions.
No comments:
Post a Comment