Felix Salmon spoke highly of this Wall Street Journal chart, and I agree.
Why do I like this? Although it's a basic chart, they did many little things well.
They are brave enough to not print any of the actual data on the chart. In other words, no loss aversion.
The legend is integrated onto the chart, not banished to some corner or border, requiring readers to stray from the graph. For added effect, the A, B, C labels imitate the actual signs posted outside the restaurants.
Sensible scales. It's even better if they would thin out the horizontal scale for the C rating, say make it 10-point intervals instead of 5-point intervals. Although this is hard to accomplish using conventional software, an axis with different intervals in different regions is surprisingly effective.
Using pencil-thin columns. The same chart with thicker columns would be both uglier and less effective.
(I'm not sure I like the up and right arrows on the axis titles. Is it better to remove the arrows and center the text?)
Reader Brad E. reminds me about the USDA's attempt to "improve" the visual presentation of dietary standards. As reported here, the food pyramid failed its mission and is retired. Here comes MyPlate!
According to this report, the government wants to impart these key points:
MyPlate offers a visual reminder to make healthy food choices when you choose your next meal.
It can help prioritize food choices and remind us to make fruits and vegetables half of our plates each meal.
On the other side of the plate – and beside it – we see the other important food groups for a healthy meal: whole grains, lean proteins, and low fat dairy.
We have been warned not to think of this as a pie chart.
What do we call a circular chart in which the area of the circle is partitioned into separate regions?
How does one say return MyPlate to the kitchen fast enough? The biggest problem here is that the key points are out of sync with the chart details. The MyPlate diet, as depicted, has less than the recommended amounts of fruits and vegetables! Since those two important food groups only equal grains and proteins, the presence of dairy means that fruits and vegetables form less than half of this diet.
The core message is that one should split one's diet in half, with fruits and veggies on the one side and grains, proteins and dairy on the other. If this is so, the following chart gets this point across with minimal probability of confusion:
If, on the other hand, the above chart is deemed too simple, and the message really does require proportions of each of the five food categories, then the sad truth is that a pie chart would have conveyed the message better.
MyPlate serves up strange portions that cannot be properly sized. How big is the "dairy" circle compared to any of the quadrants? How does one judge the irregularly-sized quadrants (grains and vegetables)? If there is any use for the pie chart, it is to display simple concepts with limited dimensions.
Felix Salmon, a blogger and foodie, investigated whether a restaurant changes its pricing based on the number of stars it gets from Sam Sifton, the New York Times' food critic. His conclusion is that "price hikes happen all over the place, from the worst-reviewed restaurants to the best." This plot was used in the post.
His message doesn't jump out of his chart. We would have to recognize that it's the dark green pieces we should be focused on, and it's the relative heights of these pieces within each stacked column. I was also misdirected by the two axis labels: number of stars and number of reviews aren't the primary dimensions. So, I thought one could find a better alternative.
This data turn out to be harder to plot than expected. The problem is that the sample size is small, and because of this, the data have ragged edges. We are better at reading patterns from smooth objects.
Here is what I ended up with, a small multiples chart with grouped columns. I adopted Felix's color scheme although no differentiation of color is really necessary in this version. Relative percentages are plotted instead of raw number of reviews. Each set of four columns can be viewed as a histogram or probability distribution. (Again, with more samples, the histograms will look smoother, revealing the pattern more clearly.)
I agree with Felix that there is not much correlation between star rating and pricing. However, this applies truly only to the middle three categories. At the edges, there are a couple of observations: all of the 4-star restaurants hiked their prices while the only restaurant that closed since it got reviewed received zero stars.
I'm a fan of annotating charts and so I'd recommend sticking a note on the 4 stars column, another note on the single gray column, and a third note bracketing the middle three categories, telling readers that there is nothing to see here.
It must be a slow news day when the media spends hundreds of words on discussing a chart that has yet to be unveiled. But the New York Times writer surely got it right with the opener: "Whatever you do, don't call it a pie chart."
We are being told that the government will replace the "food pyramid" with a food pie chart although they will call it something else. This thing which is not a pie chart has not been revealed yet but because chart purists have such political clout it was thought necessary to release a trial balloon on a holiday weekend to gauge what the response might be.
I think we should reserve our judgment till we see this thing.
In any case, the focus on encouraging people to eat the right proportions of foods is wrong-headed. Firstly, it is next to impossible for anyone to keep track of the distribution of foods consumed in any given day, unless you keep a diary. Secondly, nutritionists know that the biggest contributor to obesity is the quantity of food being eaten. Thus, a much more effective way is encouraging smaller portions, or knowing when to stop eating. This method also happens to be much easier to put into practice.
Dan at Eye Heart New York has a fantastic post relating to the recent release of restaurant health inspection data by New York City. This has caused a furor among the restaurant owners because they are now required to wear their A/B/C badges front and center. Dan collected some data (which he also posted), made some charts, and reported some interesting insights.
Here is an overview chart that shows the distribution of scores (the higher the score, the lower the grade). He called it a "scatter plot" but it is really a histogram where the bucket size is 1 except for the rightmost bucket.
I like the use of green, yellow and red colors to indicate (without words) the conversion scale from scores (violation points) to grades (A/B/C). The legend "Count" is an Excel monstrosity. I'd have used a bucket size of at least 5, which would smooth out the gyrations in the green zone.
A more typical way to summarize numeric data in groups is Tukey's boxplot, as shown below.
I use Dan's raw data on this chart. 1 = A, 2 = B, 3 = C. What is group 4?
It turns out Dan has removed this group from all of his analysis. A little research shows that group 4 are restaurants that have been closed by the Dept of Health. Interestingly, the scores of these restaurants are spread widely so the DOH appears to be closing restaurants not just for health violations. (In the rest of this post, I have removed group 4.)
For those not familiar with box plots, the box contains the middle 50% of the data (in this case, the scores of the middle half of the restaurants in the respective group); the line inside the box is the median score; the dots above (or below, though nonexistent here) the vertical lines are outliers. As Dan pointed out, group C has lots of outliers on the high end of the score.
Just for fun, I pulled the violations of the highest scoring restaurant (111 violation points). What I find intriguing is the huge fluctuation in scores over the last 5 inspections. Does this happen to other restaurants too? What does that say about the grading system?
Next, Dan then attempted to address the questions: did scores vary across the 5 boroughs? and did scores vary across cuisine groups? This is the concept covered in Chapter 1 of my book: always look at the variation around averages, that's where the most interesting stuff is.
He calculated the means and standard deviations of different subgroups. It is simpler to visualize the data, again using boxplots.
Here's one dealing with boroughs, and it is clear that there is not much to pick between them. You could possibly say Staten Island is better than the other 4 boroughs.
Here's one dealing with cuisine groups, using Dan's definitions.
The order of the cuisine groups is by median score from lowest on the left to highest on the right. Again, there is no drastic difference. It is certainly not the case that Asian/Latin American restaurants are worse than say European or American ones.
About half of the restaurants under desserts, drinks, misc., african, and others received As while a bit less than half of the other cuisine groups got As. Some of the cuisine groups had few egregious violators (African, Middle East) - but this data is perhaps skewed by the removal of the "closed" restaurants.
One shortcoming of the traditional boxplot is the omission of how large each group is. For groups that are too small, it is difficult to draw any statistical conclusions. We know from Dan's table, for instance, that there were only 17 restaurants classified as "African".
(Unfortunately, Excel does not have built-in capability for generating boxplots.)
Lots of ideas from readers have been gathering dust in my mailbox. Here are a bunch of links, with a few comments of mine.
This first link I'm not sure what to make of. I think the architects and graphic designers amongst you may be able to make sense of it. Not me. It came with this description: "dr. dr. crash and dr. trash of m-a-u-s-e-r
analyzed worlds most junk magazines and visualized their data." For the intrepid (and I claim no liability):
"Jetistics: The Analysis of Junk. The Junk of Analysis?"
This is yet another example of a map adding little or no value to the data. The presence of geographic data is not an excuse to give a lesson on maps.
It would be one thing if the geographic location helps the readers understand the data but in most such charts, the map merely says "Reader, I presume you are map illiterate, so let me tell you South Africa is at the southern tip of the African continent..."
Also notice that the bar charts are sorted by average size of invoices, which is definitely less meaningful than total amount invoiced. This, I suspect, is the failure to ask the pertinent question, which is at the top of the Trifecta checkup.
#2 on this list is a chart (rather old data) on GM food, an issue of concern to me. In the Trifecta checkup, this addresses an important question, and displays very relevant data but uses a poor chart... too many colors, colors not carrying any meaing, hard-to-read labels.
Of the other links, these are more interesting: #10, #12, #17, #19, #8, #9.