Nov 06, 2007

The eyeball test

This set of graphs was used by the NYT to discuss changes in U.S.  spending patterns over time.  For this post, I am focusing on the bottom left and bottom right graphs.  One shows spending on energy as a percent of GDP; the other, on "nonresidential structures" (aka, commercial buildings).

Nyt_spending

At first glance, spending on energy and that on commercial buildings look very similar in shape (see above or below left).  Alas, this "eyeball test" doesn't work very well with time series data.  Lets investigate further.

Redospend1_2

"Standardizing" the data (above right) tells us whether the swings are unusual or not in the history of the data.  So in the 1980s, commerical building spend spiked to more than three times the standard deviation above the historical average.  Generally speaking, the standardized unit of 3 is taken to mean highly unusual. 

Notice that the peaks of the left graph had equal heights but on the right graph, energy spending peaked only above two while commerical building spend rose above three.  This is because energy spending has been more volatile historically so it takes larger jumps (or plunges) to count as "unusual" movements.  This information is hidden in the unstandardized version.

Further, since we are concerned with long-term trends, lets take a look at five-year moving averages (below right): in other words, each time point is the average of the preceding five years worth of data. 

Redospend2

The fluctuations have been smoothed out and the peaks are no longer as high.  Glancing at this chart, we may still conclude that the spending patterns are quite similar -- especially in the period prior to 1995.

But is that really the case?  Zooming in on the 1980s, we may mistakenly think the two lines are "close together" if our eyes read the horizontal distance and/or area between the curves, rather than focusing on the vertical distance.  The arrows on the bottom left chart depict this difference.  To make things clearer, the bottom right chart plots the vertical distances between the two lines.

Redospend3

Observe that the difference expanded to above 1 unit in the late 1980s.  A difference of one unit is very large in the standardized scale (of "unusualness") since 0 is business as usual and 3 is "highly unusual".

Eyeballing the two time series would lead us to believe that the two series are similar but we run the risk of underestimating the differences as illustrated here.


Source: "Auto Sector's role Dwindles, and Spending Suffers", New York Times, Nov 3 2007.

Oct 17, 2007

Points of comparison

Econ_mortgage In light of the current housing crisis, arising from mortgage defaults, I pulled this graphic from a Jan 2007 opinion piece that plotted historical default rates of mortgages.  Notice the high degree of stretching on the vertical axis that exaggerates the volatility: essentially, the annual delinquency rate ranged from 1.75% to 2.65% during the last six years or so.  One might be forgiven to think that a 2% default rate is quite acceptable.

Nyt_mortgage_2 Compare the above chart to the pair that showed up in the NYT in Oct 2007 (see right).  The default rates here are in the 10-20% range, very alarming indeed.

The two graphics illustrate a key issue of "aggregation" in statistical analysis.  The first graphic is super-aggregated: all types of mortgages of all ages are put together to calculate each year's default rate.  The second graphic hones in on subprime mortgages only.

More importantly, the second graphic presents data in "vintages".  Each line represents loans originated during a particular year (a "vintage").  This establishes comparability.  On the first chart, each point in time represents the default rate of mortgages averaged over all ages (some loans may be only a few months old; others may be 15 years old).  Since the default rate is much higher for very young mortgages than for older mortgages, such averaging hides crucial information.

Overall, the NYT graphic very effectively conveys the alarming trend of new mortgages performing much worse, especially those originated in 2007.

Redo_mortgage It can benefit from two slight edits: adding a few more years, and using vertical lines (the most critical comparisons are default rates for loans of a given age!)  Something like this...


Sources: "As Defaults Rise, Washington Worries", New York Times, Oct 16 2007; "Mounting Mortgage Credit Problems", economy.com, Jan 23 2007.

Oct 15, 2007

Sense of proportion

[I'm back from vacation.  Will provide my reaction to the responses to the Gelman challenge, and for those who have sent me email, I will work through them soon.]

The NYT commented on a trend among marketers to shift their advertising spending from so-called "measured" media like print and TV to so-called "unmeasured" media like product placements, contests, etc. 
The following chart accompanied the article:

Nyt_ads_2


This construct is akin to a population pyramid; it's great for comparing two groups along one metric, say age groups between males and females.  Here, the two halves aren't comparable groups but two different metrics.  The main metric, that is, the proportion of unmeasured, is not directly depicted: the reader must figure out mentally how much of each bar the black part covers.  Also, the companies are sorted by unmeasured media spending but this leaves the measured spending with a jagged profile, confusing matters.

As for the little white slits on the gray bars, they are admittedly cute but it is difficult to compare the detailed breakdown between print, TV and other media among companies.

The following dot plot gives the two halves equal weight.  Redoads1(Pink dots are measured, blue unmeasured.) It's not a very interesting graphic though. The sense of proportion is still missing.

I settled on a scatter plot which relates the proportion spent on unmeasured to the total amount of spending.  It appears that the largest advertisers had the lowest proportional unmeasured spend while the smallest (among the majors) had the highest.  (It's only a weak correlation: a linear fit yields only 16% R-squared.)
Redoads2


















Source: "The New Advertising Outlet: Your Life", New York Times, Oct 14, 2007.









Sep 17, 2007

Structuring a chart

Nytmpg This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.

The chart was built from one basic component, repeated for each model. 
Nytmpgsm_2I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.

The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway).  Readers can benefit from a stronger guidance in where to look for the most pertinent information.  As the chart stands, it is merely a container for the data.  It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.

In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups.  Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. 
RedompgThe chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates.  It's easy to see that all dots sit to the left of vertical ticks.

More subtly, we can also see that the hybrids appeared to have been penalized more.  Or perhaps, the higher the rating, the larger the downward adjustment...

Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.

Aug 28, 2007

Cheers

Nyt_mets07


This is an exemplary chart from the NYT Sports page.  It provides a clear, informative and exciting way to visualize how the baseball season has gone for the Mets this and last year.  It's been mostly up and not much down. 

We can observe the more subtle differences: last season was a steady rise with only two prolonged down periods; this season's curve is driven by two up periods (including right now), outside of which the record has hovered around two levels (0, +3).

Especially commendable is the judicious use of axis labels.  However, I'm not clear on how some of the labels were chosen.  For example, 14 games ahead seem to me a rather arbitrary one.

All in all, a job well done.

Source: "Not Only Yankee Fans Cheering for Week 22", New York Times, Aug 27, 2007

Aug 08, 2007

On the bubble

Nyt_candminsA couple of you noticed this table of bubbles in the Times, and asked what I think of it.  Dustin J suggested that this could be considered a decent application of bubble charts.  I agree, with some reservations.

The data set is the best thing about this chart.  The riches that lay beneath!  Many questions can be addressed, including:

  • Which Presidential candidates are getting the most face time?
  • Are candidates seen equally often across the stations?
  • Are there differences between network and cable stations in terms of total face time?  In terms of individual face time?
  • Are there Democratic/Republican leanings by station?  by type of station?

The intrepid can even build a regression out of it.

The bubble chart contains answers to all those questions but nothing jumps out. Okay, it's easy to see the station that gives each candidate the most face time.  Anything else requires moderate to a lot of effort.  Here's the junkart version.


Redocandmins_2 The list of things done to the data is long:

  • Candidates are grouped together by party
  • Candidates within each party are arranged in order of decreasing maximum face time
  • Stations are arranged by increasing total face time, this order happens to retain the network vs cable divide
  • A heat map construct is used instead of bubbles: the legend is missing but there are four hues for each color: darkest = top 10%; medium = 10th - 50th percentile; light = bottom 50th percentile excepting zeroes; white = no face time.  In raw numbers, 90th percentile = 81 minutes, 50th percentile = 19 minutes.
  • The only data shown are the totals by candidate and totals by station.
  • On the right margin are little bar charts that show the distribution of network/cable for each candidate.
  • On the bottom margin are little column charts showing the distribution of party affiliation by station.

A few observations follow:

  • Cable stations gave much more face time to the candidates in general.  Fox, no surprise, gives Republicans 85% of its time while all the others were roughly equal.
  • The more mainstream the candidate, the balanced was the time spent on networks versus cable.  John McCain (R), Hillary Clinton (D) and John Edwards (D) had the highest proportion of network time.
  • More time is not necessarily good since McCain was the clear winner but his campaign is struggling

Source: "Tracking Face Time", New York Times, August 1, 2007.

Jul 26, 2007

Noisy subways

This NYC subway report is impossible to read.
Nyt_subwayreport

However, it is very difficult to find a good way to show the information.  In fact, the data contained very little of that.  Curiously, the ratings are very dispersed so that each line is graded high on some category and low on others.  Here's one view of it:

Redo_subwayreport

I have grouped the subway lines together (A/C/E, 4/5/6, etc.).  The metrics are plotted left to right in the same order as in the original.  Is it all noise and no signal?

(I just realized the vertical axis is reversed: best ratings are at the bottom, worst ratings at the top.  Doesn't matter anyway since I can't see any patterns.)

Source: "No. 1 Train is Rated Highest by Commuter Advocates", New York Times, July 24 2007.

PS. Two contributions from readers.  Still looking for insight from this data...

Trains789fg5_2 Trainspotmatrix_2


Jul 16, 2007

Gauging the water level

Nyt_waterThis set of charts covered the back page of one of New York Times' sections this weekend.

Regular readers will share my enthusiasm for the top chart.  It makes a clear, cogent case to support the article's thesis concerning the rise of bottled water.  Various renditions of this type of chart have appeared here, for example.

Specifically, the smart use of color to cluster the line objects helps interpret the trends.  Blue sets out the two primary interests.  (It's a mystery to me why the gray lines were separated into darker and lighter hues.)

The twenty-year horizon used is another nice touch. I'd remove the gridlines although they aren't too distracting here.

Sadly, the second graphic does not meet the high standard of the first.  The biggest problem concerns the red rectangle, purportedly showing how much of the bottled water was imported.  The choice of differently-sized bottles as objects makes it impossible to gauge what proportion of the total was imported.  If the rectangle was placed over 1-litre bottles instead, it would look smaller.

Source: "A Battle Between the Bottle and the Faucet", New York Times, July 15, 2007.

Jun 10, 2007

A disconnect

Nyt_kuoThe Times ran a slate of graphics "analyzing" seven nights of concerts by a blogger.  On the left is one of these charts. 

I am not sure what to make of it.    All I can say is the chart designer had fun.  More on his blog.

Source: "7 Nights of Bright Eyes (in as Many Colors", New York Times, June 10, 2007

Jun 04, 2007

Airline bumps and bump charts

The Harvard Social Science Statistics blog pointed to an NYT article about revenue optimization in the airline industry.  Huge props to the Times for explaining the science (and art and politics) of one of the most successful applications of operations research.

In short, valuable business travellers want refundable tickets.  Because of this and other reasons, about 10% of booked tickets become no shows.  Airlines recoup the loss by over-booking.  Implicitly, they trade off the potential for dissatifying a few unlucky passengers (who would be bumped from their flights) and the potential for flying with 10% empty seats (in addition to unsold seats).  Optimization algorithms (constantly tuned by entry-level staff) try to strike a balance.

Recently, because the average percentage of seats sold has been going up, the room for such maneuvreing has been squeezed, leading to higher bump rates, and more travellers being stranded.  There is some variation across airlines due to the level of sophistication of their revenue optimization algorithms, corporate strategy, etc.

The following charts present data by airline of the bump rates in 2005 and 2006.  One would be interested in answering questions such as:

  • Which airlines have the best (or worst) bump rate?
  • Are some airlines consistently better (or worse) at controlling the bump rate?
  • Which airlines have improved (or worsened) from year to year?
  • Are the differences of practical significance?

Redo_airlinebumps

The original chart shown on the left does not reveal the answers readily.  My favourite bumps chart offers them up clearly (well, except on the question of significance).

The biggest problem, though, is the header: number of passengers per 10,000 bumped.  The data plotted appeared to be the reverse: the number of bumps per 10,000 passengers.  Otherwise, there would have been more bumped passengers than passengers!

Source: "Bumped Fliers and No Plan B", New York Times, May 30, 2007.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31