This one takes time to make, takes even more time to read

Reader Matt F. contributed this confusing chart from Wired, accompanying an article about Netflix viewing behavior. 

Wired_netflix_chart-1

Matt doesn't like this chart. He thinks the main insight - most viewers drop out after the first episode - is too obvious. And there are more reasons why the chart doesn't work.

This is an example of a high-effort, low-reward chart. See my return-on-effort matrix for more on this subject.

The high effort is due to several design choices.

The most attention-grabbing part of the chart is the blue, yellow and green bars. The blue and yellow together form a unity, while the green color refers to something else entirely. The shows in blue are classified as "savored," meaning that "viewers" on average took in less than two hours per day "to complete the season." The shows in yellow are just the opposite and labeled "devoured." The distinction between savored and devoured shows appears to be a central thesis of the article.

The green cell measures something else unrelated to the average viewer's speed of consumption. It denotes a single episode, the "watershed" after which "at least 70 percent of viewers will finish the season." The watershed episode exists for all shows, the only variability is which episode. The variability is small because all shows experience a big drop-off in audience after episode 1, the slope of the audience curve is decreasing with further episodes, and these shows have a small number of episodes (6 to 13). In the shows depicted, with a single exception of BoJack Horseman, the watershed occurs in episode 2, 3, or 4. 

Wired_netflix_inset1Beyond the colors, readers will consider the lengths of the bars. The labels are typically found on the horizontal axis but here, they are found facing the wrong way on pink columns on the right edge of the chart. These labels are oriented in a way that makes readers think they represent column heights.

The columns look like they are all roughly the same height but on close inspection, they are not! Their heights are not given on top of the columns but on the side of the vertical axis.

The bar lengths show the total number of minutes of season 1 of each of these shows. This measure is a peripheral piece of information that adds little to the chart.

The vertical axis indicates the proportion of viewers who watched all episodes within one week of viewing. This segmentation of viewers is related to the segmentation of the shows (blue/yellow) as they are both driven by the speed of consumption. 

Not surprisingly, the higher the elevation of the bar, the more likely it is yellow. Higher bar means more people are binge-watching, which should imply the show is more likely classified as "devoured". Despite the correlation, these two ways of measuring the speed of consumption is not consistent. The average show on the chart has about 7 hours of content. If consumed within one week, it requires only one hour of viewing per day... so the average show would be classified as "savored" even though the average viewer can be labeled a binge-watcher who finishes in one week.

***

[After taking a breath of air] We may have found the interesting part of this chart - the show Orange is the New Black is considered a "devoured" show and yet only half the viewers finish all episodes within one week, a much lower proportion than most of the other shows. Given the total viewing hours of about 12, if the viewer watches two hours per day, it should take 6 days to finish the series, within the one-week cutoff. So this means that the viewers may be watching more than one episode at a time, but taking breaks between viewing sessions. 

The following chart brings out the exceptional status of this show:

Redo_wirednetflixchill_v2

PS. Above image was replaced on 7/19/2017 based on feedback from the commenters. Labels and legend added.


Light entertainment: Making art by making data

Chris P. sent in this link to a Wired feature on "infographics."

The first entry is by Giorgia Lupi and Stefanie Posavec.

Wired_Stefanie-Data-Final

These are fun images and I enjoy looking at it as hand-drawn art. But it's a stretch to call them "data visualization," "data," or "data analysis," which are all tags used by the Wired editing staff.

(PS. Wired chose a particular example of their work. There are many examples of Lupi's work that strike a balance between handicraft and data communications.)

 


Making charts beautiful without adding unneeded bits

Reader Dave S. sent me to some very pretty pictures, published in Wired.

Wired_311_1 This chart, which shows the distribution of types of 311 calls in New York City by hour of day, is tops in aesthetics. Rarely have I seen a prettier chart.

***

The problem: no insights.

When you look at this chart, what message are you catching? Furthermore, what message are you catching that is informative, that is, not obvious?

The fact that there are few complaints in the wee hours is obvious.

The fact that "noise" complaints dominate in the night-time hours is obvious.

The fact that complaints about "street lights" happen during the day is obvious.

There are a few not-so-obvious features: that few people call about rodents is surprising; that "chlorofluorocarbon recovery" is a relatively frequent source of complaint is surprising (what is it anyway?); that people call to complain about "property taxes" is surprising; that few moan about taxi drivers is surprising.

But - in all these cases, there are no interesting intraday patterns, and so there is no need to show the time-of-day dimension. The message can be made more striking doing away with the time-of-day dimension.

The challenge to the "artistic school" of charting is whether they can make clear charts look appetizing without adding extraneous details.

 


Of placebos and straw men

Note: This post is purely on statistics, and is long as I try to discuss somewhat technical issues.

(Via Social Sciences Statistics blog.)

This article in Wired (Aug 24,2009) is a must-read.  It presents current research on the "placebo effect", that is, the observation that some patients show improvement if they believe they are being treated (say, with pills) even though they have received "straw men" (say, sugar pills) that have no therapeutic value.

The article is a great piece, and a terrible piece.  It fascinated and frustrated me in equal measure.  Steve Silberman did a good job bringing up an important topic in a very accessible way.  However, I find the core arguments confused.

Let's first review the setting: in order to prove that a drug can treat a disease, pharmas are required by law to conduct "double-blind placebo-controlled randomized clinical trials".  Steve did a great job defining these: "Volunteers would be assigned randomly to receive either medicine or a sugar pill, and neither doctor nor patient would know the difference until the trial was over."  Those receiving real medicine is known as the treatment group, and those receiving sugar pills is the placebo control group.  Comparing the two groups at the end of the trial allows us to establish the effect of the drug (net of the effect of believing that one is being treated).

(I have run a lot of randomized controlled tests in a business setting and so have experience interpreting such data.  I have not, however, worked in the pharma setting so if you see something awry, please comment.)

Two key themes run through the article:

1) An increasing number of promising drugs are failing to prove their effectiveness.  Pharmas suspect that this is because too many patients in the placebo control group are improving without getting the "real thing".  They have secretly combined forces to investigate this phenomenon.  The purpose of such research is "to determine which variables are responsible for the apparent rise in the placebo effect."

2) The placebo effect meant that patients could get better without getting expensive medicine.   Therefore, studying this may help improve health care while lowering cost.

Theme #1 is misguided and silly, and of little value to patients.  Theme #2 is worthwhile, even overdue, and of great value to patients.  What frustrated me was that by putting these two together, not sufficiently delineating them, Steve allowed Theme #1 to borrow legitimacy from Theme #2.

To understand the folly of Theme #1, consider the following stylized example:

Effect on treatment group = Effect of the drug + effect of belief in being treated

Effect on placebo group = Effect of belief in being treated

Thus, the difference between the two groups = effect of the drug, since the effect of belief in being treated affects both groups of patients.

Say, if the treatment group came in at 15, and the placebo at 13, we say the effect of the drug = 15 - 13 = 2.


A drug fails because the effect of the drug is not high enough above the placebo effect.  If you are the pharmas cited in this article, you describe this result as the placebo effect is "too high".  Every time we see "placebo effect is too high", substitute "the effect of the drug is too low". 

Consider a test of whether a fertilizer makes your plant grow taller.  If the fertilized plant is the same height as the unfertilized plant, you would say the fertilizer didn't work.  Who would conclude that the unfertilized plant is "unexpectedly tall"?  That is what the pharmas are saying, and that is what they are supposedly studying as Theme #1.  They want to know why the plant that grew on unfertilized soil was "so tall", as opposed to why the fertilizer was impotent.  (One should of course check that the soil was indeed unfertilized as advertised.)

Take the above example where the effect on the placebo group was 13.  Say, it "unexpectedly" increased by 10 units.  Since the effect of the treatment group = effect of drug + effect of believing that one is treated, the effect of the treatment group also would go up by 10.  Because both the treatment group and the control group believe they are being treated, any increase in the placebo effect would affect both groups equally, and leave the difference the same.  This is why in randomized controlled tests, we focus on the difference in the metrics and don't worry about the individual levels.  This is elementary stuff.

One of their signature findings is that some cultures may produce people who tend to show high placebo effects.  The unspoken conclusion that we are supposed to draw is that if these trials were conducted closer to home, the drug would have been passed rather than failed.  I have already explained why this is wrong as described... the higher placebo effect lifts the metrics on both the treatment and the control groups, leaving the difference the same.

There is one way in which cultural difference can affect trial results.  This is if the effect of the drug is not common to all cultures; in other words, the drug is effective for Americans (say) but not so for Koreans (say).  Technically, we say there is a significant interaction effect between the treatment and the cultural upbringing.  Then, it would be wrong to run the trial in Korea and then generalize the finding to the U.S.  Note that I am talking about the effect of the drug, not the effect of believing one is being treated (which is always netted out).  To investigate this, one just needs to repeat the same trial in America; one does not need to examine why the placebo effect is "too high".

I have sympathy for a different explanation, advanced for psychiatric drugs.  "Many experts are starting to wonder if what drug companies now call depression is even the same disease that the HAM-D [traditional criterion] was designed to diagnose".  The idea is that as more and more people are being diagnosed as needing treatment, the average effect of the drug relative to placebo group gets smaller and smaller.  This is absolutely possible: the marginal people who are getting diagnosed are those with lighter problems, and thus those who derive less value from the drug, in other words, could more easily get better via placebo.  This is also elementary: in the business world, it is well known that if you throw discounts at loyal customers who don't need the extra incentive, all you are doing is increasing your cost without changing your sales.

No matter how the pharmas try, the placebo effect affects both groups and will always cancel out.  Steve even recognizes this: "Beecher [who discovered the placebo effect] demonstrated that trial volunteers who got real medication were *also subject to placebo effects*."  It is too bad he didn't emphasize this point.

On the other hand, Theme #2 is great science.  We need to understand if we can harness the placebo effect.  This has the potential of improving health care while at the same time reducing its cost.  Of course, this is not so useful for pharmas, who need to sell more drugs.

I think it is not an accident that Theme #2 research, as cited by Steve, are done in academia while Theme #1 research is done by an impressive roster of pharmas, with the help of NIH.

The article also tells us some quite startling facts:

- if they tell us, they have to kill us: "in typically secretive industry fashion, the existence of the project [Theme #1] itself is being kept under wraps."  Why?
- "NIH staffers are willing to talk about it [Theme #1] only anonymously, concerned about offending the companies paying for it."
- Eli Lilly has a database of published and unpublished trials, "including those that the company had kept secret because of high placebo response".  Substitute: low effect of the drug.  This is the publication bias problem.
- Italian doctor Benedetti studies "the potential of using Pavlovian conditioning to give athletes a competitive edge undetectable by anti-doping authorities".  This means "a player would receive doses of a performance-enhancing drug for weeks and then a jolt of placebo just before competition."  I hope he is on the side of the catchers not the cheaters.
- Learnt the term "nocebo" effect, which is when patients develop negative side effects because they were anticipating them

Again, highly recommended reading even though I don't agree with some of the material.  Should have focused on Theme #2 and talk to people outside pharma about Theme #1.




Food art

Adam, who is the designer behind the Wired graphics special on "The Future of Food", asked about the rest of the series.  We previously made some comments on a set of mini donut charts.

The first thought that came to mind after browsing through all the charts was: what a great job they have done to generate interest in food data, which has no right to be entertaining.  Specifically, this is a list of things I appreciated:

  • An obvious effort was undertaken to extract the most thought provoking data out of a massive amount of statistics collected by various international agencies.  There weren't any chart that is overstuffed, which is a common problem.
  • It would be somewhat inappropriate to use our standard tools to critique these charts.  Clearly, the purpose of the designer was to draw readers into statistics that they might otherwise not care for.   Moreover, the Wired culture has long traded off efficiency for aesthetics, and this showed in a graph such as this, which is basically a line chart with two lines, and a lot of mysterious meaningless ornaments:
  • Wired_feedtheworld
  • A nice use of a dual line chart, though.  It works because both data series share the same scale and only one vertical axis is necessary, which is very subtly annotated here.
  • The maintenance of the same motifs across several charts is well done.  (See the pages on corn, beef, catfish)


Further suggestions:

  • Wired_bar It would be nice if Wired would be brave enough to adopt the self-sufficiency principle, i.e. graphs should not contain a copy of the entire data set being depicted.  Otherwise, a data table would suffice.  The graphical construct should be self-sufficient.  This rule is not often followed because of "loss aversion"; there is the fear that a graph without all the data is like an orphan separated from the parents.  Since, as I noted, these graphs are mostly made for awe, there is really no need to print all the underlying data.  For instance, these "column"-type charts can stand on their own without the data (adding a scale would help).
  • Not sure if sorting the categories alphabetically in the column chart is preferred to sorting by size of the category.  The side effect of sorting alphabetically is that it spreads out the long and the short chunks, which simplifies labelling and thus reading.
  • Not a fan of area charts (see below).  Although it is labelled properly, it is easy at first glance to focus on the orange line rather than the orange area.  That would be a grave mistake.  The orange line actually plots the total of the two types of fish rearing, not the aquaculture component.  The chart is somewhat misleading because it is difficult to assess the growth rate of aquaculture.  Much better to plot the size of both markets as two lines (either indiced or not).
  • Wired_aquaculture 


Reference: "The Future of Food", Wired, Oct 20 2008.


Mini donuts

Wired_diet As a reader noted, this chart is essentially unreadable.  It contains data for the composition of diets in four countries during two time periods.

What might we want to learn from this data?

Are there major differences in diet between countries?

Within each country, are there changes in diet composition over the thirty years?

If there were changes in diet inside a country over time, did those reflect a worldwide trend or a trend specific to that country?

Unfortunately, the use of donut charts, albeit in small multiples, does not help the cause.  The added dimension of the size of the pies, used to display the total calories per person per day, serves little purpose.  Seriously, who out there is comparing the pie sizes rather than reading off the numbers in the donut holes if she wants to compare total calories?

This data set has much potential, and allows me to show, yet again, why I love "bumps charts".

Here is one take on it.  (Note that the closest data I found was for six different countries - China, Egypt, Mexico, South Africa, Philippines, India - and for different periods.)

Redo_diet1

The set of small multiples recognizes that the comparison between 1970 and 2000 is paramount to the exercise.  There is a wealth of trends that can be pulled out of these charts.  For example, the Chinese and Egyptians take in much more vegetables than the people of the other countries; in particular, the Chinese increased the consumption of vegetables drastically in those 30 years. (top row, second from left)

Or perhaps, for sugars and sweetners, consumption has increased everywhere except for South Africa.  In addition, the Chinese eat a lot less sugars than the other peoples. (top row, right)

Egg consumption also shows an interesting pattern.  In 1970, the countries had similar levels but by 2000, Mexicans and the Chinese have outpaced the other countries. (bottom row, right)

These charts are very versatile.  The example shown above is not yet ready for publication.  The designer must now decide what are the key messages, and then can use color judiciously to draw the reader's attention to the relevant parts.

Also, some may not like the default scaling of the vertical axes. That can be easily fixed.


Finally, here is another take which focuses on countries rather than food groups.  We note that too many categories of foods make it hard to separate them.

Redo_diet2


References: "Who's Eating What?", Wired, Oct 2008; "The Double burden of malnutrition", FAO, 2006.


Maps and dots

Happy New Year

The cosmos of university ranking got more interesting recently with the advent of the "brain map" by Wired magazine.  This new league table counts the total number of winners of five prestigious international prizes (Nobel, Fields, Lasker, Turing, Gairdner) in the past 20 years (up to 2007); and the researcher found that almost all winners were affiliated with American institutions.
Wired_brainmap
As discussed before, the map is a difficult graphical object; it acts like a controlling boss.  In this brain map, the concentration of institutions in the North American land mass causes over-crowding, forcing the designer to insert guiding lines drawing our attention in myriad directions.  These lines scatter the data asunder, interfering with the primary activity of comparing universities.

Wired_dots The chain of dots object cannot stand by itself without an implicit structure (e.g. rows of 10).  This limitation was apparent in the hits and misses chart as well.  Sticking fat fingers on paper to count dots is frustrating.  Simple bars allow readers to compare relative strength with less effort.

Redo_brainmap_2

In the junkart version, we ditched the map construct completely,  retaining only the east-west axis.  [For lack of space (and time), I omitted the US East Coast and Washington-St. Louis.]  With this small multiples presentation, one can better contrast institutions.

To help comprehend the row structure, I inserted thin strikes to indicate zero awards. A limitation of the ranking method is also exposed: UC-SF has a strong medical school and not surprisingly, it has received a fair share of Nobel (medicine), Lasker and Gairdner prizes; but zero Lasker and Gairdner could be due to less competitive medical schools or none at all!


Reference: "Mapping Who's Winning the Most Prestigious Prizes in Science and Technology", Wired magazine, Nov 2007.


Flight of fancy

Wiredh5n1sm

The venerable Wired magazine has surely gone too far with this flight of fancy!  Consider:

  • The zig-zagging lines streaming across the map
  • The redundant white dots, each of equal size, contradicting the black dots, with size proportional to prevalence
  • The inexplicable use of 00, 01, 02, ...
  • The use of a taller column for human cases, when tallied, amounting  to about 1/20 the number for bird cases
  • The inclusion of Australia (with zero cases) while excluding the Americas (also zero cases)
  • Ordering the countries neither by bird nor human cases but by convenience of placement on the map

Redoh5n1As with a previous example, the map adds nothing to the data except for providing a lesson in geography.  We prefer a parallel bar chart, shown on the right.  Here, the continents are given different colors.  In an unusual move, I chose different scales for each side as I am more interested in the distribution among countries, rather than the relative prevalence of bird/human cases.

Reference: "Flight H5N1: Delayed", Wired Magazine, October 2006.


Racetrack entertainment

A warm welcome to readers of Science.  (Junk Charts is selected as "Best of the Web" this week.  Also thanks to Mitchell for the nice write-up.)

WiredgreenRacetrack graphs was a novelty item here some time ago.  They made an appearance in the October issue of Wired Magazine, known for its design.  We have already discussed information distortion in such charts.

This chart fails the self-sufficiency test, forcing readers to read and interpret the data labels, and to ignore the racetrack construct.

Graphical elements applied as cosmetics?  Charts sacrificing data integrity for entertainment?  This takes us back to our previous discussion: can good charts be entertaining?  Now flipped over: can entertaining charts be good?

Reference: "Good, Green Livin'", Wired Magazine, 10/2006.