Calming the rip tide
Nov 10, 2006
Xan Gregg at Forth Go helpfully scraped the auto market share data off the NYT chart discussed here before. He even created an improved chart based on histograms.
I have created another view of the data, using boxplots. Tukey's boxplot is one of the most spectacular graphical inventions, as I have said before (see here, for example). Its power is evident again for this data set.
This chart is in fact two boxplots superimposed on the same surface. I forgot to put on the legend: the green boxes represent U.S. market shares, and the blue boxes Europe shares.
The automakers are ordered by decreasing U.S. market shares (with apologies to European readers).
Lots of information can be immediately read off this chart:
- The European market is much more fragmented than the U.S. market.
- The Big 2 (GM, Ford) has had mixed fortunes over this period (as indicated by the large variance)
- The Big 2 are competitive in Europe although they are definitely not dominant there
- Several key players in Europe (Peugot, Renault, Fiat, BMW) have negligible shares in the U.S
Most importantly, there is little evidence that the U.S. market is "looking more like Europe".
One weakness of the above chart is the suppression of temporal information: there is no indication whether the recent shares are moving to the left or the right of the medians (center of each box).
In the next chart, with the Europe data removed, I highlighted the data for the most recent 5 years in red. I can make the general statement that there is a small movement towards less concentration and more parity in the U.S. market but one have to conclude that the U.S. market shares in 2000-2006 look more similar to the U.S. market shares in 1990-1999 than to Europe market shares.
P.S. I added legends to the charts.
The box plot is unsuited to the audience (New York Times readers), and doesn't show time trends. Hard to see this as an improved chart.
Posted by: ZBicyclist | Nov 10, 2006 at 01:33 PM
"One weakness of the above chart is the suppression of temporal information."
This makes the chart nearly useless. In most cases, the time sequence goes from the end of one whisker, through the boxes in order, and to the opposite end of the other whisker. Not at all what Tukey must have thought about quartiles and outliers.
Posted by: Jon Peltier | Nov 10, 2006 at 04:37 PM
Maybe I'm just a bit slower than most but I'm having a tough time reading those charts. Different colored boxes, dashed lines, thick solid lines ... a lot of visual cues that aren't intuitive. How am I supposed to read those charts? Is there a key I'm missing?
Posted by: Wrand | Nov 10, 2006 at 05:23 PM
Well, since Jon Peltier is commenting here, perhaps you could check out the last entry in his site: panel charts. Panel charts are just right for this type of data. I uploaded a chart with the auto market share data here. Take a look!
Posted by: Jorge Camoes | Nov 10, 2006 at 08:13 PM
Jorge, the panel chart works nicely. Thanks for sharing.
Posted by: zbicyclist | Nov 11, 2006 at 01:04 PM
Jorge's Panel Chart version has my vote for the best approach to tackle this dataset. (Though both it and most of the alternatives proposed would likely be too intimidating to most of the NYT's readers).
A couple of nits to pick with Jorge's attempt.
* First, the chart really needs a clearly labeled vertical axis. All we have is the 20% mark and that doesn't tell us what the other values are.
* Second, the horizontal axis seems to have more ticks that there are data points... The axis label (from 90-95) implies 16 points, the data set actually contains 17, and I count 18 ticks... puzzling.
* Finally, my personal taste would be to not have a separate label on the top of the chart, but rather just label one of the data sets directly. Say the G.M.
For a similar take on this challenge, see http://www.processtrends.com/images/chart_small_multiple_hor_01.gif
Posted by: Zuil | Nov 12, 2006 at 12:30 PM
This has been a great discussion, and I agree that the panel chart is an attractive option to display the data.
However, it does not directly address the question posed by the article: is the U.S. market becoming "like the European market"?
The reason is that data from older periods are a distraction. To answer that question, we must compare the recent U.S. market shares with recent Europe market shares, and in addition, to show that the U.S. market shares have shifted recently.
There is a general lesson here, which is that sometimes, it is okay to suppress the time dimension. Time is not any different from other variables; if we are willing to collapse other variables, we should be willing to collapse time as well.
Posted by: Kaiser | Nov 12, 2006 at 07:42 PM
Kaiser, I am not sure if we can suppress the time dimension if we include the word "becoming" in our question. We can, however, create an indicator that shows us if the markets are becoming more similar. For example, some years ago, the three larger players in US accounted for almost 75% of the total market. Now, they have less than 50%. Meanwhile, the European market was stable around 40%. You can add a panel to show this trend and, based on this indicator, conclude that the US market is becoming like the European market. The "total market share of the larger players in each market" becomes your indicator of similarity (this is a simple measure that can be understood even by the NYT readers...).
Posted by: Jorge Camoes | Nov 13, 2006 at 08:56 AM
Another solution for a more sofisticaded audience: display the comulative shares in each market (using a Pareto chart or something like that) and use animation to show the trend. I don't have the time to do it myself, bu I am sure the visual effect can be very impressive.
(Take a look at the talk by Hans Rosling in TED (http://www.ted.com/tedtalks/tedtalksplayer.cfm?key=hans_rosling) to see how effective animation can be to show change over the years.)
Posted by: Jorge Camoes | Nov 13, 2006 at 09:37 AM
I checked out this website after seeing a mention of it in a major science journal. I am a graduate student and I have an entry level position preparing samples and data for a government scientific agency. I display data all the time, but have never had specific instruction about how to construct figures. I really appreciate the commentary on this website because it will help me to make better figures and to work on them more efficiently. Thanks!
Posted by: Allison Sayer | Nov 14, 2006 at 01:56 PM
Jorge, you can keep the word "becoming" as long as you have two time periods visible, but two need not be treated as a continuum any more. They can be treated as a comparison pair.
In my work, I once turned a confusing mess of ten years of tick marks into a pair of distributions, by turning all the tick marks for the five most recent years into identical blue ticks, and the five least recent years into identical grey ticks. The forest of ticks collapsed into two overlapping distributions, to which the observer's eye could easily answer the question "are the blue ticks generally distributed higher or lower than the grey ticks?"
I could have destroyed more time information by turning the clusters of blue and grey ticks into a pair of box-and-whisker shapes, one blue and one grey, but that turned out not to be necessary: the trends were clear.
Posted by: derek | Nov 27, 2006 at 07:00 AM
PS as it happens, I would not have been destroying any more information by making boxes and whiskers: literally every year would have been present as a minimum, a maximum, a median, or a quartile :-)
Posted by: derek | Nov 27, 2006 at 07:01 AM