Various ways to show variability
Aug 13, 2013
Reader Doeke W. sends me to this chart.
I like many aspects of this exercise. This chart displays the results of an experiment conducted by a computer games company to show that the new build ("249") renders frames faster than the older build ("248"). The messages of the chart are clear: the 249 build (blue bars) is substantially faster, over 80% of the frames render in 7 miliseconds or fewer under 249 compared to less than 40% under 248, and less obviously, the variance of frame times is also significantly smaller.
The slight problem is that readers probably have to read the text to grasp most of the above.
***
Using lines (or areas) improves the readability.
In the text, the author explains how to turn time per frame into frame per second, the more common way of measuring rendering speed. The formula is 1000 divided by time per frame. Wouldn't it be better if the chart plots fps directly?
When it comes to presenting distributions (or variability), the cumulative chart is more useful but it also is harder for readers to comprehend. For example:
The beauty of this chart is that one can take any point on the vertical axis, say, 80% level and read off the comparative values of 7 millisecond for the blue line (249) and 10.5 ms for the red (248). That means 80% of the 249 frames were rendered in fewer than 7 ms, relative to 10.5 ms for 248 frames.
Alternatively, taking a point on the horizontal axis, say 5 milliseconds, one can see that about 8% of 248 frames would reach that threshold but 30% of 249 frames did.
The steeper the ascent of the S-curve, the more efficient is the rendering.
And make the graphics even better by labelling the lines in the chart directly instead of having to look at the legend twice* before knowing which line is for which build.
*twice: the blue line in the graphic is higher than the red one, but in the legend, it is below the red line...
Posted by: Rclickhandbuch.wordpress.com | Aug 13, 2013 at 09:17 AM
Comparing distributions has a rich history. You mention overlaying histograms (a variation of the first plot) and overlaying densities or CDFs (the second and third plots), but in some fields such as Quality Control, comparative (paneled) displays are used instead of overlays. There is also a graphical display called the spread plot that was promoted by Chambers, Tukey, and especially Cleveland in the 80s and 90s. The spread plot has fallen out of use, probably because it requires some skill to interpret it, but I saw Chambers use it in his JSM 2013 presentation last week. For a comparison of strengths and weaknesses of these approaches, see my blog post about visually comparing distributions
Posted by: Rick Wicklin | Aug 13, 2013 at 02:14 PM
Rick: Am I mistaken in thinking that this cumulative spread chart is just the CDF turned sideways and upside down?
For a truly sophisticated audience, I think nothing beats the qq-plot. The problem is such a plot works only if the reader has the patience to read an essay about how to read the chart, before reading the chart!
Posted by: Kaiser | Aug 13, 2013 at 11:17 PM
Rclickandbuch: Good points. Elsewhere on the blog, I do mention those tips. It's not the point of this post and I didn't bother as this is not real work.
Posted by: Kaiser | Aug 13, 2013 at 11:28 PM
Boxplots, anyone?
Posted by: Doug Gabbard | Aug 14, 2013 at 12:08 AM
It seems to me that there's something wrong in the second line chart. Areas under the curve are not the same.
Posted by: Antonio | Aug 16, 2013 at 02:48 AM
Doug: Boxplots would be great for this purpose. It will show the lower mean and lower variability clearly. I do find, though, that nontechnical people often have trouble processing a boxplot. Not sure if others have similar experience.
Posted by: Kaiser | Aug 19, 2013 at 01:41 PM