Saturday, September 22, 2012

The power to see the future is exciting and terrifying

In a recent comment in Nature, Daniel Acuna, Stefano Allesina, and Konrad Kording describe a statistical model for predicting h-index. In case you are not familiar with it, h-index is a citation-based measure of scientific impact. An h-index of n means that you have n publications with at least n citations. I only learned about h-index relatively recently and I think it is a quite elegant measure -- simple to compute, not too biased by a single highly-cited paper or by many low-impact (uncited) papers. Acuna, Allesina, and Kording took publicly available data and developed a model for predicting future h-index based on number of articles, current h-index, years since first publication, number of distinct journals published in, and number of articles in the very top journals in the field (Nature, Science, PNAS, and Neuron). Their model accounted for about 66% of the variance in future h-index among neuroscientists, which I think is pretty impressive. Perhaps the coolest thing about this project is the accompanying website that allows users to predict their own h-index.

Since hiring and tenure decisions are intended to reflect both past accomplishments and expectations of future success, this prediction model is potentially quite useful. Acuna et al. are appropriately circumspect about relying on a single measure for making such important decisions and they are aware that over-reliance on a single metric to produce "gaming" behavior. So the following is not meant as a criticism of their work, but two examples jumped to my mind: (1) Because number of distinct journals is positively associated with future h-index (presumably it is an indicator of breadth of impact), researchers may choose to send their manuscripts to less appropriate journals in order to increase the number of journals in which their work has appeared. Those journals, in turn, would be less able to provide appropriate peer review and the articles would be less visible to the relevant audience, so their impact would actually be lower. (2) The prestige of those top journals already leads them to be targets for falsified data -- Nature, Science, and PNAS are among the leading publishers of retractions (e.g., Liu, 2006). Formalizing and quantifying that prestige factor can only serve to increase the motivation for unethical scientific behavior.

That said, I enjoyed playing around with the simple prediction calculator on their website. I'd be wary if my employer wanted to use this model to evaluate me, but I think it's kind of a fun way to set goals for myself: the website gave me a statistical prediction for how my h-index will increase over the next 10 years, now I'm going to try to beat that prediction. Since h-index is (I think) relatively hard to "game", this seems like a reasonably challenging goal. Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Predicting scientific success. Nature, 489 (7415), 201-202. DOI: 10.1038/489201a
Liu, S. V. (2006). Top Journals’ Top Retraction Rates. Scientific Ethics, 1 (2), 91-93.

No comments:

Post a Comment