Tuesday, August 7, 2012

Customizing ggplot graphs

There are many things I love about the R package ggplot2. For the most part, they fall into two categories:

  1. The "grammar of graphics" approach builds a hierarchical relationship between the data and the graphic, which creates a consistent, intuitive (once you learn it), and easy-to-manipulate system for statistical visualization. Briefly, the user defines a set of mappings ("aesthetics", in the parlance of ggplot) between variables in the data and graph properties (e.g., x = variable1, y = variable2, color = variable3, ...) and the visual realizations of those mappings (points, lines, bars, etc.), then ggplot does the rest. This is great, especially for exploratory graphing, because I can visualize the data in lots of different ways with just minor edits to the aesthetics.
  2. Summary statistics can be computed "on the fly". So I don't need to pre-compute sample means and standard errors, I can just tell ggplot that this is what I want to see and it will compute them for me. And if something doesn't look right, I can easily visualize individual participant data, or I can look at sample means excluding some outliers, etc. All without creating separate copies of the data tailored to each graph.
This great functionality comes at a price: customizing graphs can be hard. In addition to the ggplot documentation, the R Cookbook is a great resource (their section on legends saved me today) and StackOverflow is a fantastic Q&A site. Today I also stumbled onto a very detailed page showing how to generate the kinds graphs that are typical for psychology and neuroscience papers. These are quite far from the ggplot defaults and my hat is off to the author for figuring all this out and sharing it with the web.

No comments:

Post a Comment