Structuring a chart
Sep 17, 2007
This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.
The chart was built from one basic component, repeated for each model. I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.
The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway). Readers can benefit from a stronger guidance in where to look for the most pertinent information. As the chart stands, it is merely a container for the data. It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.
In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups. Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. The chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates. It's easy to see that all dots sit to the left of vertical ticks.
More subtly, we can also see that the hybrids appeared to have been penalized more. Or perhaps, the higher the rating, the larger the downward adjustment...
Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.
Why not sort the y-axis by mpg? Then the three groups would naturally fall out, and then wouldn't be the large "jumps" between the different groups.
Posted by: Hadley Wickham | Sep 17, 2007 at 01:56 PM
Hadley: I was thinking from a consumer perspective, you're either in the market for a sedan or a truck. I'd have put clear dividers between the three groups.
Posted by: Kaiser | Sep 17, 2007 at 07:34 PM
It would be interesting to see this expressed and visualised as a percentage reduction. I guess this must be how they calculate the changes.
Posted by: Jens | Sep 18, 2007 at 05:26 AM
Jens, or plotted on an exponential scale, which will show up the same thing: whether there is a constant ratio between "before" and "after".
The exponential scale will have the advantage of not destroying the original values, as a percentage operation does.
(The mischeivous side of me wants to find the size of the fuel tanks in gallons, plot that as a log-log scatter graph, and draw diagonal lines for the nominal range of the cars on a single full tank :-)
Posted by: derek | Sep 18, 2007 at 07:12 AM
I agree with Hadley. Why?
Because the whole point of having hybrid cars is that people will choose them over cars with high (and unknown future) running costs. So they are 'sedans' really.
If you want an ordering variable I would use the product of wheelbase & distance between front & back wheels. This is a measure of usable space for people or load into the vehicle. You might sqrt that product too of course.
If you sorted by mileage you could colour by vehicle 'group'.
I do not think miles per full tank will work well because manufacturers put bigger tanks in vehicles that use more petrol. Miles per $100 might work better.
Sorry, I am late to the commenting party...
Nice blog though. :-)
Posted by: DaveG | Oct 05, 2007 at 01:58 PM