Apr 14, 2008

Progress and retrogress

Joran E. pointed to this "icky" chart he found on Clive Crooks' blog at the Atlantic. 
Orig_tertiary

He ordered a "junkchart treatment", so here it comes.

First we wanted to process the triangles, dots and squares to make sense of this data.  We noted that the data came from a single year (2005) so the chart did not trace the development of the education sector over time.  But wait, it used a different route to get at the same idea.  The author compared different generations within each country to see if more and more citizens took university degrees.  So each vertical "arrow" was kind of a historical record of different generations within a country.  Under this criterion, Korea and Japan had come a long way while the US and China stagnated.

The chart is quite impossible to read as designed.  There is little reason to sort by 25-34-year-old proportion when the message concerns improvement over generations.  Besides, what about countries that apparently retrogressed?  (like Russia and Germany)

Redo_tertiary2For this data, I returned to my favored bumps chart.  Here is version one.  There are two ways to read this chart: across countries, we note that most of the European states (blue) had similar profiles showing roughly a constant rate of growth.  The Asian duo of Japan and Korea (brown) had the most marked growth.  Of North America (black), Canada diverged from the US since the 35-44 generation.

Alternatively, we can focus on the change generation-over-generation.  From 55-64 to 45-54, almost all countries in this sample (except Japan) grew at the same rate.  Then between 45-54 and 35-44, the two Asian countries clearly set the pace.  The generation between 35-44 and 25-34 is most interesting: Korea has not slowed, Japan has slowed a little but still grew as fast as Canada.  A trio of European countries (Spain, Ireland, France) outpaced their neighbors.

Below I show version two.  This one combines bumps chart with small multiples.  North America, Europe and Asia/Australia are now in separate charts.  This removes clutter.

Redo_tertiary

 

Mar 22, 2008

Trying too hard

In the course of business and governing, a lot of charts are generated.  An anonymous tipster pointed us to a set created by the "Communities and Local Government" division in the UK government.  Judging from the content, this division has responsibility for economic development in local neighborhoods.

Below are a pair of exhibits.  Truly they are trying too hard!  What we see is a hybrid scatter-bubble chart.  Between the jargon, the acronyms (LAD, LSOA), the boxed text, the multi-color circles, the colored axis labels and lack of title, the reader is plunged into a state of confusion.

Uk_communities3

The chart can be unraveled.  Each district was evaluated based on two measures of "gaps in worklessness".  The vertical axis compares each district to the national average; positive numbers indicate an above-average district relative to the nation.  The horizontal axis compares the most deprived 10% neighborhood within each district to the local average; positive numbers indicate worst neighborhoods improving. 

Thus, the policy goal would be to move all districts into the upper right quadrant.  The multi-color bubbles were designed to show us the state of the nation.  On the left chart, 41% of the districts (or population?) reside in the improving districts while 19% live in deteriorating areas.

The following strategies can help improve readability:

  • Redo_communities3use English on the axis
  • relegate technical definitions to the legend
  • add succinct title to tell the story
  • use color on the data rather than on axis or data labels
  • use color to draw attention to the upper right quadrant
  • remove bubbles
  • define acronyms

 

Mar 01, 2008

Don't believe what you see

Mankiw's blog linked to a press release by the Congressman Jim Saxton, using CBO data to show "middle income tax burden at lowest level in decades".  Cbo_taxrateThe attached graph, as Junk Charts readers will immediately recognize, is classic chartjunk.  Every time the vertical axis does not start at zero,  one suspects something is amiss.  And what with the gridlines and data labels?

"Don't believe it? Check out the data source yourself."  I followed Mankiw's suggestion and was indeed surprised... but not by the great fortune of the "middle class".  The surprise was how the chart painted a dishonest picture of the CBO data.

The original chart plotted only the tax rate experienced by the middle 20% of the population. 
Redo_taxrate1The CBO provided data for all five quintiles; why not plot them all?  In this new chart (right), the "surprise" windfall to the middle 20% proved not to be anything special at all!  All five quintiles, especially the middle three, followed pretty much the same trend over time.  The effect of singling out the middle 20% is to deprive the context by which the data should be interpreted.

Further, what might be the result of the declining middle income tax burden?  Redo_taxrate3 The CBO data painted an unexpected picture.  Paradoxically, as the middle 20% see their tax rate decrease, they also earn a smaller share of the nation's after-tax income (black line at right).  At the same time, the top 1% saw their share of after-tax income double from about 8% to almost 16% (blue line).  The top 20% line is also upward-sloping although less pronounced.  So, the implication that the middle class have had it good is plainly wrong.

What is going on?  Two factors were at play and the Congressman presented
only one side of the story (the tax rate).  What he omitted was that during this period, the nation's wealthy took home larger and larger shares of the pre-tax income.  This shift in pre-tax income more than offset any relative reduction in tax rate for the middle 20%.

This distortion can be traced back to the use of quintiles (or more generally, ranks).  We use them to cope with data having extreme distributions but a by-product is losing information about how extreme are the extreme values.  As demonstrated here, the quintiles from old are really different from the quintiles from today because the underlying distribution has become much more extreme.

Finally, another bit of mystery (to me) is how the middle 20% came to be considered "middle class".  Is there a widely accepted definition?

Reference: "CBO Data Show Middle Income Debt Burden At Lowest Level in Decades", Feb 21 2008.

Feb 25, 2008

Playful and exploratory

I share reader Bernard L.'s enthusiasm for this very imaginative chart, courtesy of the graphics people at NYT.  The chart captures the ebb and flow of weekly movie receipts over the last two decades.
Nyt_films
The details that particularly interest me include:

  • The addition of area colors (on top of lines) serves to highlight box office successes; this really helps readers sort out the massive amount of data
  • Nicely spaced text (and dots) does not interfere with our reading of the chart
  • The hiding of text for less important films, plus taking advantage of interactivity to show their titles if the reader mouses over the respective areas

All of the above indicate a keen sense of foreground versus background.  Besides, the authors had the good sense to speak of inflation-adjusted box office sales; I'm tired of the movie industry proclaiming higher sales each year when ticket prices are rising, and the population is growing.

This is another chart where more data do not easily translate into better communication (see my guest post at Flowing Data).  While I like the playful nature of the interactive chart, it is left to the reader to discover the information buried in the data, such as the assertion in the header that Oscar-winning films typically take time to attain box-office success while many blockbusters do not Oscars make.

In this presentation, it is challenging to compare the total receipts of one film versus another (this requiring comparing oddly shaped, partially obscured areas).  It is also hard to compare across years since the data is spread out over a lot of space.

There may really be two types of graphics: the one like the example here which is a dictionary and designed for exploration; and the other kind where the designer has selected a subset of the data to make a specific point.

Reference: "The ebb and flow of movies", New York Times, Feb 23 2008.

Feb 03, 2008

Redundancy

Nick B., who occasionally writes about statistical graphics, found some classic chart junk from a Canadian report on the Afghan army.  Here's one example, together with the junkchart version.Redoafghan_2

Redundancy is an enemy of good graphics, and incongruous redundancy is worse.  Here, troop level is variously described as "total force size", "strength" and "army growth"; the chart on the right uses only the army concept.  The data labels ("47000 Strength"), the axis labels ("50000 Total Force Size"), and the gridlines all germinate from the five grand data points underlying the entire chart!

Another distorting feature is that use of different-sized time intervals, which we space out appropriately on the right chart.

Ultimately, the key message should be growth in the army size, not the absolute number of troops.  The slopes of the line segments encode this information.  Alternatively, a data table can be rather powerful for simple data like this:

Redoafghan2 By what is called the "end state", there would be 70% more troops than those as of December 2007.

 


Dec 09, 2007

Lacking buzz

Nielsen, they of the ratings, is roughing it in the information age.  When they announced on-line tracking tools, Wired quipped: "It's looking like online video policing companies will have to make room for another deputy."  Last year, cable companies revolted over a service measuring the effectiveness of commercials.

Via the Data Mining blog, I learnt about yet another new on-line offering, called "Hey! Nielsen" for obscure reasons.  (Perhaps Hey! Nielsen is the new Yahoo! !)

The site is an enigma wrapped in a mystery.  The official description says:

Hey! Nielsen is the place to make a name for yourself while trading opinions on TV, movies, music, personalities, web sites and more.

How does one "trade" opinions?

According to the FAQ, the "Hey! Nielsen" score, the cornerstone of the site, is:

a real-time indicator of a topic's impact and value and you play a major role. As the site evolves and users submit their opinions and commentary, the score will rise or fall based on a number of factors including, but not limited to, user opinions, news coverage, and raw data from our sister sites Billboard.com, HollywoodReporter.com, and BlogPulse.com.

Sounds like a product aimed at marketers to help them track public opinion but offering little control over sampling. 

The "Hey! Nielsen" buzz chart (below) captures the change in "Hey! Nielsen" score over time.

Heynielsen

This chart is an unfortunate case of flipping background into foreground.  What grabs our attention are those hideous white circles with numbers in them.  The legend explains that these are the daily numbers of opinions on the subject, in other words, the daily sample sizes.  As they stand now (with the site still in beta), they serve to expose the low level of participation, leading to small sample sizes, and irrelevance.  But what when the site became super-popular, would the circles say 56234, 19245, 90257, etc.?  Why would visitors care about daily sample sizes anyway?  Mousing over these circles reveal text but in most cases, they are blocked by neighboring white circles.

In the meantime, the circles obscure the line which shows the trend in the "Hey! Nielsen" score over time.  This chart reminds me of that Google toy known as Google Trends.  The Googlers provide no vertical scale so the graphs are unreadable.  "Hey! Nielsen"ers provide a vertical scale -- kind of -- but the graphs are still meaningless: what does a score of 881 mean?  how about 724?  what is the maximum score?  what is the minimum?  Beware numbers without context.

The vertical axis does start from zero but has an odd spacing of tick labels. The gridlines are distracting and serve no purpose.  The orange area under the curve also makes little sense.

We look forward to seeing version 2.0.

 

Dec 02, 2007

Live dynamic graphics

In the second interesting item of the week, I return to the fabulous Google Finance chart, which shows the distribution of stock market returns by sector.  I wrote about it twice (here and here).  In the original post, I saluted the engineers for figuring out the formidable technical issues of turning a live dynamic data stream into a live dynamic graphic but didn't go into details.  (Trust me.)

Goog_oops The other night, this chart popped up on my browser.

Oops.

If someone kept track of each time such a mishap showed up, the tally would probably be 1-5% of the time.

The triple challenge of generating this graphic is the volume of data that needs to be processed, the velocity at which it changes, and the flicker of time from input to output, probably not more than a few minutes. The analysis and charting must be maintained continuously during market hours.  For any such projects, the thing to manage is the error rate, and one should be totally thrilled if it's in the range Google engineers have achieved.

Nov 16, 2007

Large tables

PrivacyRichard J. asked how we might make sense of this tableLarge tables present lots of challenges.  The trick is to enhance the table with colors and shapes; and as usual, remove any data that doesn't help make your argument.

This table compares countries across different measures of privacy.  Each measure is rated on a scale of 1 to 5, with some blanks.  These ratings are averaged to obtain an overall rating, listed on the right.

In the junkart version, the ratings are presented as slots inside a box.   The overall rating is placed right below the name of the country since this is the most important measure, and how the countries were ordered.  The rows and columns are reversed so as to explain how the overall rating can be decomposed into individual metrics for each country.  I have only shown the top five countries but obviously the chart can be extended to cover all the data. 

Redo_privacy

If desired, the top 5 countries in each measure can be given a different color: this would increase the data-ink ratio on the chart.  One weakness of this type of chart is that the rows and columns do not have equal status: comparing across rows is more difficult than comparing up and down columns.

Richard also wonders about their treatment of the blanks.  It appears that they omit blanks so each country's rank is the average of non-blank measures.  Omitting blanks may seem innocuous but in fact, this is equivalent to assigning the blank measures ratings equal to the country's average non-blank rank.  Richard wonders if this is the best way to treat these blanks.

 

Source: "Leading surveillance societies", Privacy International.

(Thanks to Richard for sending me the data.)

Nov 11, 2007

Red-lining by marriage

Bbc_family Tom W., a reader, noticed this map featured on a BBC News page about the UK family.

One can roughly make out the shape of Great Britain so this is some kind of cartogram.
The title announces that this cartogram concerns the "distribution of population". 

In a typical map like this, the redder reds would indicate higher densities of people.  Yet, the article tells us that the population is divided evenly into 85 squares, each containing
"roughly half a million people over 18 years old".

Instead, we seem to have 500K widowed people next to 500K re-married people (most of whom prefer the coasts, by the way), etc.  Apparently, the Brits practise a form of red-lining based on marital status!

The S/M/W/D/R labels are also redundant and very distracting; and the white gridlines interfere with our ability to read the grey boundaries.

Source: "The UK family", BBC News.

Oct 28, 2007

Clocks and pies

Keith A submitted this graphical idea from the folks at Ikea (via Boing Boing). 
Ikea
Based on the comments, it seems like some people really like this presentation!

Consider these for amusement:

  • Does the "9" on Sunday mean 9 am or 9 pm?  (This chart mixes A.M. and P.M. hours in a totally nonchalant way.)
  • If the above is too easy, try the "9" for Saturday!
  • Why was "9" displayed on Sunday anyway?  Meanwhile, why wasn't "7" displayed for Saturday?  (How were the hour labels chosen?)
  • Why was "Closed" written on the chart while "High", "Mid", and "Low" were put into the legend?
  • Since pie charts show proportions, is it possible to describe what proportions were plotted?

Reminds me of this pie chart.



Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31