Dan at Eye Heart New York has a fantastic post relating to the recent release of restaurant health inspection data by New York City. This has caused a furor among the restaurant owners because they are now required to wear their A/B/C badges front and center. Dan collected some data (which he also posted), made some charts, and reported some interesting insights.

Here is an overview chart that shows the distribution of scores (the higher the score, the lower the grade). He called it a "scatter plot" but it is really a histogram where the bucket size is 1 except for the rightmost bucket.

I like the use of green, yellow and red colors to indicate (without words) the conversion scale from scores (violation points) to grades (A/B/C). The legend "Count" is an Excel monstrosity. I'd have used a bucket size of at least 5, which would smooth out the gyrations in the green zone.

A more typical way to summarize numeric data in groups is Tukey's boxplot, as shown below.

I use Dan's raw data on this chart. 1 = A, 2 = B, 3 = C. What is group 4?

It turns out Dan has removed this group from all of his analysis. A little research shows that group 4 are restaurants that have been closed by the Dept of Health. Interestingly, the scores of these restaurants are spread widely so the DOH appears to be closing restaurants not just for health violations. (In the rest of this post, I have removed group 4.)

For those not familiar with box plots, the box contains the middle 50% of the data (in this case, the scores of the middle half of the restaurants in the respective group); the line inside the box is the median score; the dots above (or below, though nonexistent here) the vertical lines are outliers. As Dan pointed out, group C has lots of outliers on the high end of the score.

Just for fun, I pulled the violations of the highest scoring restaurant (111 violation points). What I find intriguing is the huge fluctuation in scores over the last 5 inspections. Does this happen to other restaurants too? What does that say about the grading system?

***

Next, Dan then attempted to address the questions: did scores vary across the 5 boroughs? and did scores vary across cuisine groups? This is the concept covered in Chapter 1 of my book: always look at the variation around averages, that's where the most interesting stuff is.

He calculated the means and standard deviations of different subgroups. It is simpler to visualize the data, again using boxplots.

Here's one dealing with boroughs, and it is clear that there is not much to pick between them. You could possibly say Staten Island is better than the other 4 boroughs.

Here's one dealing with cuisine groups, using Dan's definitions.

The order of the cuisine groups is by median score from lowest on the left to highest on the right. Again, there is no drastic difference. It is certainly not the case that Asian/Latin American restaurants are worse than say European or American ones.

About half of the restaurants under desserts, drinks, misc., african, and others received As while a bit less than half of the other cuisine groups got As. Some of the cuisine groups had few egregious violators (African, Middle East) - but this data is perhaps skewed by the removal of the "closed" restaurants.

One shortcoming of the traditional boxplot is the omission of how large each group is. For groups that are too small, it is difficult to draw any statistical conclusions. We know from Dan's table, for instance, that there were only 17 restaurants classified as "African".

(Unfortunately, Excel does not have built-in capability for generating boxplots.)

The wide variation in scores over time for high scoring restaurants is fine, when you think about the generating mechanism. There are a series of possible offences, and the high scoring restaurants will be doing many of them at some time, but not necessarily at every visit. So for each there will be a binomial probability at each visit. There will also be different points for each, building in further variability, so probably the variance is a function of the mean at least and probably mean squared. It would be interesting to fit the data and see what the relationship is.

Posted by: Ken | Aug 12, 2010 at 04:45 AM

Excluding the closed restaurants might make sense, if your goal is assessing the likelihood that an open restaurant will have a poor inspection record. It's like properly ignoring Monty's opened door in the Monty Hall problem.

Posted by: derek | Aug 13, 2010 at 06:35 AM

I'm not sure I agree with using box plots for your first chart (though I can't tell if you're recommending it for this situation, or just pointing it out as an example).

All it tells me is that all the restaurants graded A have scores grouped towards the low end of the range, which is the definition of a grade A. Hence, you're showing the correlation and distribution that is implicit in the scoring mechanism. The more interesting question is what is the distribution and proportion of restaurants with each score/grade. (That said, group "4" is interesting to compare here.)

But I do agree with the use of box plots for the comparison of other categorizations.

Posted by: Sage | Aug 15, 2010 at 09:24 PM

Sage: great comment. I'd not have put up the first boxplot if Group 4 weren't there. It serves to show that Group 4 is a mixture of everything else. Also, I want to say that data analysts should always prepare this chart to check for any data errors; if something is miscoded, you can see right away there is an outlier on this chart.

Derek: One misstep in the original article is to have removed Group 4 without comment. It is a best practice to always state (in a footnote) if the data has been altered.

If we can distinguish between those restaurants closed for health reasons from those closed for other reasons, then it is better to include the former group in the analysis, as you're implying.

Ken: that's an interesting way of thinking about a model for this. I was getting at a more basic point, which is that any grading system that fluctuates so much from year to year is bound to be quite worthless... is it measuring some transient thing (as you suggested) or something fundamental about each restaurant?

Posted by: Kaiser | Aug 15, 2010 at 11:30 PM

In your explanation of box plots (I think) you've omitted to explain what the whiskers represent. Are they min & max in this case?

Posted by: Phy2sll | Aug 17, 2010 at 08:40 AM

Which software did you use to generate the box-plot charts at the bottom?

Posted by: Stef | Aug 17, 2010 at 09:33 AM