The above chartis another one in the NYT series on the NFL playoffs. It evaluates the mix of passing and rushing attempts by offense. The convoluted way by which the caption strains to tell a story indicates trouble ahead:
Of the three playoff teams that threw the ball the most, two of them come from cities known for cold weather. Conversely, of the three teams that ran the most, two of them play their home games in milder weather.
The implication is that teams from cold-weather cities are supposed to want to rush more, and vice versa. And the data (total of six samples) pointed to the opposite.
This presentation suffers from low data-to-ink ratio: too much ink is spilled over not much data. The designer arbitrarily picks one of the two variables (passing attempts, rushing attempts) as the primary, sorting variable -- trace the orderly green diamonds on the right chart. This makes it hard to see a pattern in the brown diamonds. As usual, a scatter plot works much better with two data series.
In the junkart version, the raw numbers of attempts are converted into proportion of attempts that were passing versus
rushing. This easy move immediately collapses the two dimensions into
one. Now, we have room to include an extra variable which matters: the average amount of snowfall in these cities.
So what does the data say about the relationship between propensity to pass and cold weather? There appears to be very little relationship as the dots are all over the chart. In particular, the teams playing in cities with the highest snowfall span the range of passing percents; similarly, those playing in lowest-snowfall cities also span the range of passing percents.
The caption ignores all the blue dots, focusing only on the gray ones. A more direct examination of the relationship reveals the folly of the so-called "not so conventional wisdom".
In this NYT article, we are told that "the most likely result when a policeman discharges a gun is that he or she will miss the target completely." That's a shocker for those of us conditioned by Hollywood movies to think anyone who picks up a gun for the first time hits the villain right on the temple. The following graphic attempts to tell the story.
The one hit here is how the distances are visually presented. The elliptical lines remind us of the neglected variable of direction; it also means the scale is correct only along one direction.
The dot matrix construct highlights the absolute numbers of shots, hits and misses but barely addresses the key issue of hit rates (accuracy). Specifically, this data set was presumably collected to explore the relationship between hit rates and distances from the target. The use of different widths clouds our judgement of proportions. To wit, it is not obvious that the 10-wide block and the 40-wide block shown left depict roughly equal hit rates (23%, 29%).
The junkart version adopts a different approach. This is the Lorenz curve, often used to show income inequality (see also here and here). Here, the shots were ordered from closest to furthest from target, then summed up by distance segments. For example, shots from 0 to 6 feet accounted for 60% of all shots but 72% of all hits.
If distance does not affect hit rates, we'd expect 60% of all shots to result in 60% of all hits. This data point would show up on the 45-degree diagonal on the chart, labelled "totally unpredictable". Any data appearing above the diagonal indicates that closer shots are more accurate, accounting for more than their fair share of hits.
Comparing the fitted blue line and the diagonal, one sees that distance is a weak predictor of hit rate. The police commissioner explains this in the article; many other variables also affect accuracy, including "the adrenaline flow, the movement of the target, the movement of the shooter, the officer, the lighting conditions, the weather..."
Note that the shots with "unknown" distances were removed from the analysis. Also, the categories of 21-45 and 45-above were combined: the rates were similar and with only three hits, it does not make sense to treat these as separate categories.
Of course, this version would not work well in the mass media. For that, one can just plot hit rates against the distance categories.
Source: "A Hail of Bullets, a Heap of Uncertainty", New York Times, Dec 9 2007; New York Firearms Discharge Report 2006.
Graduation rates at 47 new small public high schools that have opened
since 2002 are substantially higher than the citywide average, an
indication that the Bloomberg administration’s decision to break up
many large failing high schools has achieved some early success.
Most of the schools have made considerable advances over the
low-performing large high schools they replaced. Eight schools out of
the 47 small schools graduated more than 90 percent of their students.
This graphic included in the NYT article lent support to the "small schools movement". In particular, note the last sentence of the above quotation: it incorporates the oft-used device of subgroup support of a hypothesis, in this case, the subgroup of eight top-performing schools.
Such analysis is "dangerous", according to Howard Wainer, who discusses this and other examples of misapplication in a recent article in American Scientist, entitled "The Most Dangerous Equation". He alleged that billions have been wasted in the pursuit of small schools.
The issue concerns sample size. Dr. Wainer and associates analyzed math scores from Pennsylvania public schools. Average scores for smaller schools are based on smaller number of students, and therefore less stable (more variable). More variability means more extremes. Thus, by chance alone, we expect to find more smaller schools among the top performers. Similarly, by chance alone, we also expect to find more smaller schools among the worst performers.
The scatter plot lays out their argument. Focusing only on the top performers (blue dots), one might conclude that smaller schools do better. However, when the bottom performers (green) are also considered, the story no longer holds. Indeed, the regression line is essentially flat, indicating that scores are not correlated with school size.
This is all nicely explained via the standard error formula (De Moivre's equation) in Dr. Wainer's article. Here is a NYT article from the mid 1990s describing this same phenomenon.
File this as another comparability problem. Because estimates based on smaller samples are less reliable, one must take extra care when comparing small samples to large samples.
Dr. Wainer is publishing a new book next year, called
"The Second Watch: navigating the uncertain world". I'm eagerly looking forward to it. His previous books, such as Graphic Discovery and Visual Revelations, both part of the Junk
Sources: "The Most Dangerous Equation", American Scientist, November 2007; "Small Schools Are Ahead in Graduation", New York Times, June 30 2007.
P.S. Referring back to the NYT chart above, one might wonder at the impossible feat of raising graduation rates across the board simply by breaking up large schools into smaller ones. This topic was taken up here, here and here. When evaluating the "small schools" policy, it is a mistake to discuss only the performance of small schools; any responsible analysis must look at improvement over all schools. Otherwise, it's a simple matter of letting small schools skim off the cream from larger schools.
[I'm back from vacation. Will provide my reaction to the responses to the Gelman challenge, and for those who have sent me email, I will work through them soon.]
The NYT commented on a trend among marketers to shift their advertising spending from so-called "measured" media like print and TV to so-called "unmeasured" media like product placements, contests, etc. The following chart accompanied the article:
This construct is akin to a population pyramid; it's great for comparing two groups along one metric, say age groups between males and females. Here, the two halves aren't comparable groups but two different metrics. The main metric, that is, the proportion of unmeasured, is not directly depicted: the reader must figure out mentally how much of each bar the black part covers. Also, the companies are sorted by unmeasured media spending but this leaves the measured spending with a jagged profile, confusing matters.
As for the little white slits on the gray bars, they are admittedly cute but it is difficult to compare the detailed breakdown between print, TV and other media among companies.
The following dot plot gives the two halves equal weight. (Pink dots are measured, blue unmeasured.) It's not a very interesting graphic though. The sense of proportion is still missing.
I settled on a scatter plot which relates the proportion spent on unmeasured to the total amount of spending. It appears that the largest advertisers had the lowest proportional unmeasured spend while the smallest (among the majors) had the highest. (It's only a weak correlation: a linear fit yields only 16% R-squared.)
Source: "The New Advertising Outlet: Your Life", New York Times, Oct 14, 2007.
One of the many gratifications of blogging is to connect with others who have similar interests; so it has been fantastic to receive user submissions (though admittedly I don't check my inbox frequently enough). The thoughtfulness of these nominations continues to impress me.
Evan sent in 254 charts he created after looking at the post on baby names. An example is shown on the right.
He is particularly interested in the question of names that are given to both males and females.
For example, the bottom chart shows that Jordan is primarily a male name, and saw a period of growth followed by decline, although the decline has been more severe on the male side than the female side.
It's a nice touch to label the most recent year. I'd also label the values for the most recent year on the axes.
Evan also offers the following solution to the scaling problem we identified in the original WSJ chart:
My solution was just to put two charts on each chart. One at a fixed scale for every chart to give a sense of size and one at a variable scale to better show the shape of the plot.
In other words, for less popular names, the top chart would look much more compressed.
There are many more charts to sift through on his site. Evan welcomes suggestions.
This chart from a Wall Street Journal editorial has been making the rounds lately, being ridiculed left and right. A number of you have been leaving comments here so I'm putting it up and center as our light entertainment for the week.
The chart is being used to justify this economic concept called the "Laffer Curve" which claims that lowering tax rates can increase total tax receipts (for example, because fewer people will cheat the government.) As far as I know, it is dogma, and has never been proven empirically.
I also agree with Prof. Gelman's skepticism about using countries as experimental units to inform domestic policy.
... illegal border crossings subsequently plummeted. Between 1953 and 1959, they fell by some 95%. In 1960, mainly in response to complaints from labor unions, the program was scaled back and eventually phased out.
Long-time readers may recall Friedman's Crossover Law of Petropolitics, where the opportune criss-crossing of lines plotted along double axes was taken as proof of causality. Friedman's Law lurked here, right in the 1953-1959 range.
The NFAP went one better: in their original version, they blew up the 1953-1959 period to show us the criss-crossing lines!
We see trouble right from the start. The "subsequent" effect that proved the case occurred in 1953, over 10 years after the program started. During that first decade, the number of apprehensions rose 4388%, in spite of the guest worker program.
A scatter plot (below left) now shows the lack of any meaningful relationship between these two variables. While high admissions appeared together with low apprehensions, any level of admissions had historically been paired with low apprehensions.
On the right, I connected the dots in chronological order. Any claim of a negative relationship between admissions and apprehensions has been debunked. From 1942 on (as we trace the line clockwise from lower left), first the nation experienced stepwise increasing admissions coupled with stepwise increasing apprehensions; then it witnessed sharply dropping apprehensions with relatively stable admissions; and finally it saw plummeting admissions while apprehensions remained low. Three separate episodes, three distinct patterns. There was no association, let alone causation.
how to create an elegant graph for Web visitor traffic statistics
that shows both how many views a page gets and then how many people click that
page to go further ("conversion rate"). Part of the problem is that conversion rates vary from, say, .3% to 50% (a wide range).
Lets work with this sample data set. I ordered it from highest to lowest click rate, which is the primary metric of interest. The number of page views is of interest too as sometimes rarely-visited pages may have high click rates.
At this point, it's important to know the context. Specifically, who controls the allocation of pages? Did the data come from a randomized experiment? Or did they get a self-selected sample (e.g. web surfers deciding which section of the site to visit)?
The first construct I tried is the "lift curve" often used in marketing. It's the same thing as the Lorenz curve used by demographers but interpreted differently. Here, we see that Guitar pages accounted for 26% of the page views but 37% of the clicks; House pages accounted for an incremental 44% of the pages and 59% of the clicks; etc. The relative click rates are immediately clear from the steepness of the line segments. The lift curve is appropriate for the self-selected case, in which we can take the allocation of page views as fixed.
If the allocation of page views is a decision to be made, then it doesn't make much sense to accumulate page views. The second construct is the "scatter plot" of % clicks versus % page views. The steepness of the line through the origin helps us compare the click rates. Bicycles is clearly inferior in generating clicks.
Both these constructs are highly efficient; adding new data does not expand the chart at all.
Keen readers will observe that the slope of the line is not the click rate but rather a click rate index (relative to the overall click rate). This means that any data point above the diagonal has above-average click rate.
The previous two posts indicated that CNN, TWC and Intellicast had the best on-line weather forecasting accuracy by looking at the median and mean error in predicting daily low and high temperatures over 41 days. Is it possible to differentiate between those three?
For that, we need more data so I switched from summary statistics back to the data. In this new chart, the day by day errors were plotted. The gridlines labelled errors within 5 degrees, which is an arbitrary guideline for acceptable / unacceptable. The three scatters looked remarkably similar although CNN appeared to hit the bull's eye (the middle square) with less bias (errors more evenly distributed) but not much better accuracy overall (similar number of unacceptable errors).