« January 2012 | Main | March 2012 »

Does this need fixing?

Reader John B. tried to improve Minard's famous chart of the Napoleon Russian campaign (link). Ed Tufte declared this chart the best ever.


Here is John's alternative. The biggest change he made was to discard the geography and emphasize the chronology.


 John also points us to this page which includes a number of different re-makes of the famous chart. (link)


The meaning of most

Megan McArdle started the war on infographics (link). And reader Danielle A. contributes this example, from KissMetrics.


This is one part of a big infographics poster. Needless to say, a bar chart renders this data much better:


The categories are sensibly sorted, and useless tinges of color removed.


But I want to draw attention to their conclusion:

Most participants in the survey would wait 6-10 seconds before they abandon pages.

Now we know writers of opinion pieces in the major newspapers have long lost control over the titles of their pieces. Is it true that graphic designers have ceded control over their conclusion statements?

It would appear so. The category being labeled as "most participants in the survey" accounted for 30% of the respondents. When is 30% considered "most"?

Also, surveys are typically tools for generalization so we expect conclusions about the general population of mobile users. Here, whoever wrote this conclusion timidly restricted the remark to "participants of the survey". This is probably an oversight because in other panels, they talk about x% of consumers or y% of mobile internet users. If the survey was probably designed and executed, they should be confident about the whole population, not just the sample.

Finally, nowhere on this poster can you discover which survey this data came from. We have no idea what the sample size is, nor the margin of error.

Necessity is the mother of invention

When there's a need to vow audiences with smart data analysis, there's invention.

Let's start with the U.S. home ownership data. The total occupied homes are subdivided into owner-occupied and renter-occupied. Thus, in any given year, we can compute the proportion of homes that are owner- or renter-occupied. We use blue for owner and red for renter, as follows:


Just to confirm, if we superimpose these two charts, we see that the proportions add up to 100%. One chart is the mirror image of the other:


Now we have confirmed the data is okay, we pull the charts apart. We change the scale of the renter chart so that the change over time is more clearly displayed. Since the home ownership bubble burst, it's the rental market that has grown.



It's time for some magic! We superimpose the charts again to obtain this:


[Ed: The remainder of the post below is modified from the original version based on reader comments]

The chart designer managed to make the two data series look different even though one series is the mirror image of the other.


The inspiration of this post came from reader Leanne C. who submitted this MSNBC chart:


Initially, I mistakenly assumed what is plotted are proportions. It just so happened that the total occupied units in the U.S. is in the 100M range and the owner v. rental are split 70M / 30M. I looked at the left end of the chart, and saw in 2001, about 33 of rental and about 69 of owner, which happens to add up to 100 (with rounding error). But if I had looked at right-end of the chart, where rental is 39 and owner is 75, then it would have been clear it's not adding up.

In any case, this chart looks different if we make the scales the same. In the following, each unit of both axes represents 2M units. There really is no justifiable reason why the scales should be different given that they both measure the same objects.


But using different ranges on each axis also presents a challenge: it is tempting to read meaning into the gaps between the two lines but these gaps merely reflect the choice of axis ranges.

Instead, we should convert all these units into growth indices. Let 100 be the year 2001 units. The following chart then shows what's really going on in housing:

Between 2001 and 2008, rental- and owner-occupied units experienced the same total growth (about 4%) although the trajectories were different... owner-occupied units went up steadily during this period while renter-occupied declined till 2004 and then experienced a faster growth rate between 2004-2008. Since 2008, renter-occupied continued about the same growth rate while owner-occupied flattened out and may be slightly declining.





Rachel features some poor bubble charts on Stats Chat, which is the blog of the Auckland Statistics department. (here, and here)


Martin discusses color scales on maps. (link)


The JMP blog takes on the challenge of visualizing European debt data (link). We discussed this some time ago (link). The difficulty of this data set is that for each pair of countries, there are two values, and each country can simultaneously be lending to and borrowing from the other.





Restoring symmetry, and another survey debunked

Reader John G. submitted a chart plus its improvement. Thank you!

The problem chart is used to present a "net promoter score" analysis by ABB (link). Net promoter score is the difference between people who will recommend a product or company and people who won't. The chart presents the components, the number of people who gave "red cards" and the number giving "green cards".


Unfortunately, the symmetry in the definition of the net promotor score is destroyed by this stacked bar chart. The red bars are all aligned against the vertical axis but the green bars aren't so it's difficult to compare their lengths.

John fixed this problem by aligning both sets of bars against a vertical axis. Sensibly, he places the red bars along the negative direction. He also orders the categories by "margin of victory", whichin effect is the net promoter score, with the category needing the most attention at the top.


The improved chart points out some of the complicating factors in understanding a metric that is composed of two components each of which vary and can be missing. For example, a category like "technical support" is rated among the highest overall but this conceals the fact that it is one category that receives many red cards. Also, consider "product/system quality" versus "health and safety": both categories end up with about the same net promoter score but the former category has many more respondents than the latter.


John also tried a scatter plot. This one requires some careful reading. The best categories are going to end up in the top left corner; it may be better to flip the red card axis to a descending order so that the top right corner is the best corner.


The diagonal rays (axes) are great visual aids to help figure out which categories are better and which are worst.

I am a little turned off by the crowdedness in the bottom left corner of the chart. Those are categories with relatively low levels of red or green cards, and also relatively balanced between reds and greens... that is to say, those are categories people don't care too much about, and among those people who care, there isn't a consensus about good or bad. In other words, the survey revealed very little of use about those categories. It bothers me that much of the survey ended up collecting data on such items.


The "data" corner of the Trifecta

TrifectaIn the JunkCharts Trifecta checkup, we reserve a corner for "data". The data used in a chart must be in harmony with the question being addressed, as well as the chart type being selected. When people think about data, they often think cleaning the data, processing the data but what comes before that is collecting the data -- specifically, collecting data that directly address the question at hand.

Our previous post on the smartphone app crashes focused on why the data was not trustworthy. The same problem plagues this "spider chart", submitted by Marcus R. (link to chart here)


Despite the title, it is impossible to tell how QlikView is "first" among these brands. In fact, with several shades of blue, I find it hard to even figure out which part refers to QlikView.

The (radial) axis is also a great mystery because it has labels (0, 0.5, 1, 1.5). I have never seen surveys with such a scale.

The symmetry of this chart is its downfall. These "business intelligence" software are ranked along 10 dimensions. There may not be a single decision-maker who would assign equal weight to each of these criteria. It's hard to imagine that "project length" is equally important as "product quality", for example.

Take one step backwards. This data came from responders to a survey (link). There is very little information about the composition of the responders. Are they asked to rate all 10 products along 10 dimensions? Do they only rate the products they are familiar with? Or only the products they actively use? If the latter, how are responses for different products calibrated so that a 1 rating from QlikView users equals a 1 rating from MicroStrategy users? Given that each of these products have broad but not completely overlapping coverage, and users typically deploy only a part of the solution, how does the analysis address for the selection bias?


The "spider chart" is, unfortunately, most often associated with Florence Nightingale, who created the following chart:


This chart isn't my cup of tea either.


Also note that the spider chart has so much over-plotting that it is impossible to retrieve the underlying data.



Where do we look?

Reader Gail Z. didn't like this chart by Business Insider (link).


It's probably not the worst we have seen. The biggest problem with this chart is that there is so much going on it's hard to know where to look! This is a huge no-no for a site like Business Insider, which is in the business of sensationalizing the most trivial things.

The first fix should be to shine a light on Mark Zuckerberg. The message of the chart is how the Facebook IPO would instantly make Zuckerberg one of the top 10 richest in America. Poor Mark is lost in the crowd here.

The second fix is to get rid of the data labels, and the gridlines. It won't matter to readers whether Mark's wealth is $25 billion or $24.7 billion. The vertical axis already provides the needed information.

Some may be offended by the color of money although I can handle it.


PS. Dear readers, I have received many submissions recently. Please be patient as I work through the backlog.

A data mess outduels the pie-chart disaster for our attention

Reader Daniel L. sends us to a truly horrifying pie chart. This:


Link to the original here.

The background: a smartphone monitoring company Crittercism compiled data on the frequency of app crashes by version of mobile operating systems (Android or Apple iOS). The data is converted into proportions adding to 100%.

If we spend our time trying to figure out the logic behind the ordering and placing of the data (e.g. why iOS is split on both sides? why pieces are not sorted by size?), we will miss the graver problem with this chart - the underlying data.


Here is a long list of potential issues:

  • Crittercism sells app monitoring tools for app developers. Presumably this is how it is able to count app crashes. But who are their customers? Are they a representative set of the universe of apps? Do we even know the proportion of Android/iOS apps being monitored?
  • There is reason to believe that the customer set is not representative. One would guess that more crash-prone apps are more likely to have a need for monitoring. Also, is Apple a customer? Given that Apple has many highly popular apps on iOS, omission of these will make the data useless.
  • The data wasn't adjusted for the popularity of apps. It's very misleading to count app crashes without understanding how many times the app has been opened. This is the same fallacy as making conclusions about flight safety based on the list of fatal plane accidents; the millions of flights that complete without incident provide lots of information! (See Chapter 5 of my book for a discussion of this.)
  • The data has severe survivorship bias. The blog poster even mentions this problem but adopts the attitude that such disclosure somehow suffices to render useless data acceptable. More recent releases are more prone to crashes just because they are newer. If a particular OS release is particularly prone to app crashes, then we expect a higher proportion of users to have upgraded to newer releases. Thus, older releases will always look less crash-prone, partly because more bugs have been fixed, and partly because of decisions by users to switch out. iOS is the older operating system, and so there are more versions of it being used.
  • How is a "crash" defined?  I don't know anything about Android crashes. But my experience with PC operating systems is that each one has different crash characteristics. I suspect that an Android crash may not be the same as an iOS crash.
  • How many apps and how many users were included in these statistics? Specifying the sample size is fundamental to any such presentation.
  • Given the many problems related to timing as described above, one has to be careful when generalizing with data that only span two weeks in December.
  • There are other smartphone OS being used out there. If those are omitted, then we can't have a proportion that adds up to 100% unless those other operating systems never have app crashes.


How to fix this mess? One should start with the right metric, which is the crash rate, that is, the number of crashes divided by the number of app starts. Then, make sure the set of apps being tracked is representative of the universe of apps out there (in terms of popularity).

Some sort of time matching is needed. Perhaps trace the change in crash rate over time for each version of each OS. Superimpose these curves, with the time axis measuring time since first release. Most likely, this is the kind of problem that requires building a statistical model because multiple factors are at play.

Finally, I'd argue that the question being posed is better answered using good old-fashioned customer surveys collecting subjective opinion ("how many crashes occurred this past week?" or "rate crash performance"). Yes, this is a shocker: a properly-designed small-scale survey will beat a massive-scale observational data set with known and unknown biases. You may agree with me if you agree that we should care about the perception of crash severity by users, not the "true" number of crashes. (That's covered in Chapter 1 of my book.)