Playthings in an unreal world

It's been said that the economic models used by many mainstream economists this decade suffered from a fatal flaw: that of many unrealistic assumptions (such as no speculative bubbles) that are often needed to "make the math work"; sometimes, these are not direct assumptions but consequences of other assumptions.  See for example Willem Buiter, Paul Krugman, Scientific American.


Another favorite plaything of economists is the so-called "Big Mac index".  The Economist magazine, which seems to own this toy, proclaims it "the most accurate financial indicator to be based on a fast-food item", and the sub-title of the page is "Exchange-rate theory".  It is the cost in US$ of a Big Mac overseas divided by the cost of a Big Mac in the U.S., and is an indicator of whether the current foreign exchange rate (vs. US$) is over-valued or under-valued.

Econ_bigmac

I saw this chart on the Business Insider site, where a claim is made about the Chinese currency being undervalued by 50%.

Sadly, the Economist has gone the way of USA Today in embellishing its graphics with distracting, loud, uninformative images.  Besides the chartjunk, we should always place the zero line in the middle of the chart for this sort of scale where the data could theoretically lean in either direction.  This allows readers to mentally judge the magnitude of the differentials. As pointed out by zbicylist, this statement isn't appropriate to this particular scale.  The scale chosen has the peculiar range of between -100% and +infinity: in order to help readers appreciate this, I would set the left edge of the scale to -100%, and let the right edge expand to cover the actual data.

Food for thought: could it be that under this economic "theory", the US$ can never be over- or under-valued, that it is always correctly valued?


Dealing with skew

Bernard L pointed us to this income distribution chart printed in the Economist.

Economist_incomedist

The accompanying paragraph points to the range of the bars, that is, the gap between the top decile average and the bottom decile average, as evidence of income disparity, concluding that the US and Britain are among the worst.

Bernard likes the use of vertical sections to represent the average incomes by decile and dislikes the USA-Today style background image.  Agreed.  But why plot the middle deciles at all when the only worthy data involve the endpoints of the bars?

A close examination of the spacing of the middle deciles leads to more befuddlement.  There does not appear to be much difference between the countries.

The answer to this is that decile statistics are not appropriate for data as skewed as incomes.  At the high end, the 10% intervals are too coarse.

One clue to this is that the top 10% in the US only earns $90,000 on average but we have all heard of the billion-dollar hedge fund managers and Wall Street bankers and $30 million a movie celebrities.  The problem is that within the top decile, the income distribution is also tremendously skewed. 

The neat idea of plotting the vertical sections indicates an awareness that the red dots (average income) are insufficient because of the skew.  Alas, there remains a lot of skew above the top decile and the designer inadvertently falls back into the same trap by considering the average income within the top 10%.  Thus, the amount of disparity on the right side of the chart is grossly underestimated.  Roughly speaking, we are looking at 10 samples of the distribution, nine of which at the low end of the range and only one at the top end (long tail).  Here is the idea:

Redo_disparity


Reference: "Spreading the wealth", Economist, Oct 21 2008.


Cram it like Koby

You have to gradually build up your gut by eating larger and larger amounts of food, and then be sure to work it all off so body fat doesn't put a squeeze on the expansion of your stomach in competition  -- Takeru Kobayashi, six-time champion of the Coney Island hot dog eating contest

Kobayashi is a phenom.  He can stuff 60 hot dogs or 100 burgers in ten or twelve minutes and show no consequences.  Ordinary people can't hope to emulate these feats.

Junk Charts sees Kobayashi as a hero; an anti-hero really.  We are ordinary people; we can't hope to cram it like Koby.  A message we keep repeating here is: too much data sinks a chart.

Econ_anglosaxon Not long after this chart showed up in the Economist, several readers urged us to take a look.  It's a well-nourished chart indeed, one to challenge Kobayashi, but for all that it contains, the reader has to try very hard to find insights.  What with the multiple colors, iron-fisted gridlines, above-and-below boxes, dotted and solid lines, and a legend with nine pieces split in two spots?  Besides, the U.S. boxes grab all the attention by virtue of them being wider (country being more partisan).

The key to unraveling this chart is to identify the relevant comparisons:

  • UK average vs US average
  • UK left vs US left
  • UK right vs US right
  • UK independent vs US independent

And then for the gluttonous:

  • UK right vs US left
  • UK left vs independent vs right
  • US left vs independent vs right

In the junkchart version, we address these comparisons sequentially.

Redo_anglosaxon1a
(Apologies for the tiny font.)

We are again using a small multiples approach that places four comparisons next to each other: average, left, independent, right. Consistently, the British is to the left of Americans.  The only places where the two cultures meet are where liberals agree on "ideology" and "military action".

Also note that we use a symmetric horizontal scale centered at 0.  There are too many charts out there where the center is not at the center!

A similar presentation addresses the other three comparisons.  Democrats in the U.S. are miles to the right of Tories in terms of "religion".  In the UK, Labor and Tories are not much different except on "ideology".  In the US, Independents lean closer to Democrats.

Redo_anglosaxon2a

Joining the lines (I hear the grumbles) helps bring out the gap between the groups being compared.  Without lines, the chart would look like this.

Redo_anglosaxon3a

It is often hard to keep track of which dot is which as they trade order from issue to issue.

PS. Anyone knows what is being measured on the horizontal axis?  The original graph mysteriously stated "respondents' views".


References: 

Eric Talmadge: "Pigout champion Kobayashi limbers up for hot dog gold" June 25, 2004

"Anglo-Saxon Attitudes", Economist, Mar 27 2008.


An embarrassment

I find it embarrassing for the Economist to print an article like this one.  (Do they have a statistics editor?)

Econ_smoking

The subtitle asserting "causality" is offensive.  It is alleged that smoking bans in bars have "caused" more road accidents because people are forced to drive longer distances to find those bars that still allow smoking.

To assert causality so starkly for an undesigned observational study is unprofessional.  I doubt that the authors of the study they cited even went so far.  At best, they probably found a correlation.

Another problem is the practical significance of the finding.  There is a 13% increase in fatal accident rate in a "typical county containing 680,000 people".  There are two problems with this statement:

  • When I check the Census data, there are only about 85 counties in the entire U.S. with at least 680,000 people.  What do they mean by "typical"?
  • 13% is said to be an increment of 2.5 fatal accidents, presumably per year.  The crane accident in Manhattan a few weeks ago killed at least five people.  I just don't believe that one can prove definitively that such a tiny difference is not due to chance so even the correlation, let alone the causality, is suspect.

It appears that the paper is locked up in pre-publication.  If you have seen it, let us know if the authors actually asserted causality.

Reference: "Unlucky Strikes", The Economist, April 3 2008.


Color scale

This map from the Economist illustrates pretty well the movement of population from middle America outwards from 2000-6.  The message reaches us despite the large volume of data painted.  (The gray shadow though was more than a little distracting.)
Econ_depop
The map piqued my curiosity in two areas:

How did they determine the color scale?  The average change over all counties (6.4%) was obviously used.  Standard deviation was not since the ranges of change were unequal in size.

Was within-county percent change the best criterion?  As is, an 80% drop in a 2,000-people county looks the same as an 80% drop in a 200,000-strong county.

Reference: "The Great Plains drain", Economist, Jan 17 2008.

PS. I am traveling and so posting will be limited.


Read fast, pay the price

At first, this looks like a decent chart despite the donut construct, which I cannot stand (but the Economist loves).

Rockstars

The accompanying text proclaimed: "Rock stars are famous for excess, and some pay the price".  The rest of the paragraph points out drug- and alcohol-related deaths, plus deaths due to "unhealthy lifestyles", which apparently include cancer and cardiovascular disease.

There is a gaping hole between what's on the chart and what's in the text.  They just talk past each other.

  • The chart invites us to compare the European experience to the American experience. Each donut presents the proportion of total deaths by causes of death. The top donut presents American rock-star deaths, the bottom European ones. But this comparison has zilch to do with the key point, which is how rock stars are different from the rest of us.  The chart tells us nothing about the rest of us.  The 20% death by cancer would be entirely unremarkable if 20% of non-rock-star deaths also were attributed to cancer!
  • We must also bear in mind that the base populations are rock stars who died young. This is a very specific demographic segment, and so the only valid point of reference are people who died young.  If we think along those lines, then among unmusical people, if they died young, what might have been the causes of death?  Drugs? Alcohol?  Accidents?  Suicide?  You bet.  I am not sure who is the authoritative source of such data but the CDC reported that among Americans aged 15-34 who died, the leading causes were "unintentional injury", suicides, homicides, cancer and heart disease.  Not much different from the above list...
  • The deaths depicted in the two donuts totaled fewer than 100, and yet percentages are given to one decimal place.  This creates a false sense of precision not justified by the sample size.
  • The deaths occurred over about 50 years.  It is very likely that the causes of premature death have shifted during this time span, making an aggregate analysis questionable.

Charting is much more than just aesthetics.  Some basic statistical common sense goes a long way.  This was observed long ago by Huff.

Source: "Rock stars: live fast, die young", Economist, Sept 4 2007.


Exception to the rule

It's pretty hard to decree hard-and-fast rules for graphical design; every rule seems to admit its exception.  This reinforces Tufte's contribution as he has successfully organized the rules in his collection of books.

Dustin J sent in this chart from the Economist.  Its first impression is ugly and overly complex.

Econ_petrol

Dustin commented:

Steven Few says not to use stacked bar charts because you cannot compare individual values very easily and as a rule I avoid stacked bars with more than six or seven divisions. What do you think of this stacked bar--I think it is quite effective in telling the story.

On this blog, I have also re-done some stacked bar charts but this one is truly an exception to the rule.  The reason why this one works is that it's not about the individual components, it's showing that the US consumes more than all those countries combined. 

If only it has the proper caption!  The Economist is uncharacteristically detached here: "Petrol consumption per day", "Litres bn, 2003".  How about "Goliath v. Davids"?  "US v. the World"? "Dream Team USA"?

It'd help if they tone down the colors; also, by simply annotating the total litres for the US and the total for the other countries, they would have made a clearer point without using gridlines.  But these are minor glitches in an otherwise effective chart.

Source: Economist, July 2007.


Mirror, mirror

Ec_sarko Mirror, mirror on the wall...

I don't see what the second line adds to this plot, given there were only two candidates in this election. 

Political graphs do not get much better than those at the Political Arithmetik blog.

For instance, in the chart below, he wisely chose to draw trend-lines rather than connecting the individual dots.  TopdemsAlso, typically, he plots dots for all the different polls, which allows us to assess the variability (reliability) of the observed trend.

 

Reference: "Sarko embraces the Anglo-Saxons", Economist, Feb 3 2007.


Horrid stuff

Ec_smoke Small multiples can work wonders when data are replicated, as in this case.  The chart accompanied an Economist article on pollution levels in several European cities, as indicated by the concentration of nitrogen dioxide and particulates.

In the junkart version, I plotted the data series side by side, rather than one over the other.  Further, the order of cities was according to decreasing levels of NO2, which seemed to be the worse pollutant.  All gridlines are removed except the 30 line which worked pretty well to separate out the highly polluted cities.

Redopollutant An odd pattern has now surfaced.  Namely, there is some degree of negative correlation between the concentration of the two pollutants.  Environmental scientists may be able to tell us why.


Reference: "The Big Smoke", Economist, Feb 3 2007.


Jamming

Econ_muslimsReaders may have noticed that I'm not a fan of the graphics aesthetics of the Economist.  (I love their subtle sarcasm, a way of saying something without saying it.  For example, the title of this chart is "where they are".  They let us read any meaning into the word "they".  As for their charts, I have taken issue on several occasions.)

This particular example uses one of their standard formats, stacked bars with an extra data series tagged on the right, its boxed annotation calling attention to itself.  It's a case of too much apparatus for a simple task.

The chart's purpose is to show that the US and France have the largest Muslim populations by numbers while France is by far the top country by percentage.

Redo_muslimsOur junkart version is very much cleaner.  Line segments indicating the low, mid and high estimates replaced the stacked bars (which falsely imply significance in adding the low and high estimates).  As usual, the minimum of gridlines and axes is used.  Instead of jamming two ideas onto one chart, if percentages are more important, then a separate chart should be produced, now ordered by decreasing percentages (see below).

The most crucial improvement is the fine print.  Perhaps extending their subtle sarcasm too far, the chart maker omitted context for interpreting the data: namely, that the low-mid-high range represents estimates by up to 5 different sources, each using potentially different methodologies for estimation.  This partially explains the huge variance in estimates for the US (or does it?).

Redo2_muslimsAlso missing is a comment on why these particular 6 countries were selected.  It may give a misleading picture of "where they are" in the context of world population.

Reference: "Where They Are", Economist, June 2006.