Deaths as percent neither of cases nor of population. Deaths as percent of normal.

Yesterday, I posted a note about excess deaths on the book blog (link). The post was inspired by a nice data visualization by the New York Times (link). This is a great example of data journalism.

Nyt_excessdeaths_south

Excess deaths is a superior metric for measuring the effect of Covid-19 on public health. It's better than deaths as percent of cases. Also better than percent of the population.What excess deaths measure is deaths as a percent of normal. Normal is usually defined as the average deaths in the respective week in years past.

The red areas indicate how far the deaths in the Southern states are above normal. The highest peak, registered in Texas in late July, is 60 percent above the normal level.

***

The best way to appreciate the effort that went into this graphic is to imagine receiving the outputs from the model that computes excess deaths. A three-column spreadsheet with columns "state", "week number" and "estimated excess deaths".

The first issue is unequal population sizes. More populous states of course have higher death tolls. Transforming death tolls to an index pegged to the normal level solves this problem. To produce this index, we divide actual deaths by the normal level of deaths. So the spreadsheet must be augmented by two additional columns, showing the historical average deaths and actual deaths for each state for each week. Then, the excess death index can be computed.

The journalist builds a story around the migration of the coronavirus between different regions as it rages across different states  during different weeks. To this end, the designer first divides the dataset into four regions (South, West, Midwest and Northeast). Within each region, the states must be ordered. For each state, the week of peak excess deaths is identified, and the peak index is used to sort the states.

The graphic utilizes a small-multiples framework. Time occupies the horizontal axis, by convention. The vertical axis is compressed so that the states are not too distant. For the same reason, the component graphs are allowed to overlap vertically. The benefit of the tight arrangement is clearer for the Northeast as those peaks are particularly tall. The space-saving appearance reminds me of sparklines, championed by Ed Tufte.

There is one small tricky problem. In most of June, Texas suffered at least 50 percent more deaths than normal. The severity of this excess death toll is shortchanged by the low vertical height of each component graph. What forced such congestion is probably the data from the Northeast. For example, New York City:

Nyt_excessdeaths_northeast3

 

New York City's death toll was almost 8 times the normal level at the start of the epidemic in the U.S. If the same vertical scale is maintained across the four regions, then the Northeastern states dwarf all else.

***

One key takeaway from the graphic for the Southern states is the persistence of the red areas. In each state, for almost every week of the entire pandemic period, actual deaths have exceeded the normal level. This is strong indication that the coronavirus is not under control.

In fact, I'd like to see a second set of plots showing the cumulative excess deaths since March. The weekly graphic is better for identifying the ebb and flow while the cumulative graphic takes measure of the total impact of Covid-19.

***

The above description leaves out a huge chunk of work related to computing excess deaths. I assumed the designer receives these estimates from a data scientist. See the related post in which I explain how excess deaths are estimated from statistical models.

 


Cornell must remove the logs before it reopens the campus in the fall

Against all logic, Cornell announced last week it would re-open in the fall because a mathematical model under development by several faculty members and grad students predicts that a "full re-opening" would lead to 80 percent fewer infections than a scenario of full virtual instruction. That's what was reported by the media.

The model is complicated, with loads of assumptions, and the report is over 50 pages long. I will put up my notes on how they attained this counterintuitive result in the next few days. The bottom line is - and the research team would agree - it is misleading to describe the analysis as "full re-open" versus "no re-open". The so-called full re-open scenario assumes the entire community including students, faculty and staff submit to a full program of test-trace-isolate, including (mandatory) PCR diagnostic testing once every five days throughout the 16-week semester, and immediate quarantine and isolation of new positive cases, as well as those in contact with such persons, plus full compliance with this program. By contrast, it assumes students do not get tested in the online instruction scenario. In other words, the researchers expect Cornell to get done what the U.S. governments at all levels failed to do until now.

[7/8/2020: The post on the Cornell model is now up on the book blog. Here.]

The report takes us back to the good old days of best-base-worst-case analysis. There is no data for validating such predictions so they performed sensitivity analyses, defined as changing one factor at a time assuming all other factors are fixed at "nominal" (i.e. base case) values. In a large section of the report, they publish a series of charts of the following style:

Cornell_reopen_sensitivity

Each line here represents one of the best-base-worst cases (respectively, orange-blue-green). Every parameter except one is given the "nominal" value (which represents the base case). The parameter that is manpulated is shown on the horizontal axis, and for the above chart, the variable is the assumption of average number of daily contacts per person. The vertical axis shows the main outcome variable, which is the percentage of the community infected by the end of term.

This flatness of the lines in the above chart appears to say that the outcome is quite insensitive to the change in the average daily contact rate under all three scenarios - until the daily contact rises above 10 per person per day. It also appears to show that the blue line is roughly midway between the orange and the green so the percent infected is slightly less-than halved under the optimistic scenario, and a bit more than doubled under the pessimistic scenario, relative to the blue line.

Look again.

The vertical axis is presented in log scale, and only labeled at values 1% and 10%. About midway between 1 and 10 on the horizontal axis, the outcome value has already risen above 10%. Because of the log transformation, above 10%, each tick represents an increase of 10% in proportion. So, the top of the vertical axis indicates 80% of the community being infected! Nothing in the description or labeling of the vertical axis prepares the reader for this.

The report assumes a fixed value for average daily contacts of 8 (I rounded the number for discussion), which is invariable across all three scenarios. Drawing a vertical line about eight-tenths of the way towards 10 appears to signal that this baseline daily contact rate places the outcome in the relatively flat part of the curve.

Look again.

The horizontal axis too is presented in log scale. To birth one log-scale may be regarded as a misfortune; to birth two log scales looks like carelessness. 

Since there exists exactly one tick beyond 10 on the horizontal axis, the right-most value is 20. The model has been run for values of average daily contacts from 1 to 20, with unit increases. I can think of no defensible reason why such a set of numbers should be expressed in a log scale.

For the vertical axis, the outcome is a proportion, which is confined to within 0 percent and 100 percent. It's not a number that can explode.

***

Every log scale on a chart is birthed by its designer. I know of no software that automatically performs log transforms on data without the user's direction. (I write this line with trepidation wishing that I haven't planted a bad idea in some software developer's head.)

Here is what the shape of the original data looks like - without any transformation. All software (I'm using JMP here) produces something of this type:

Redo-cornellreopen-nolog

At the baseline daily contact rate value of 8, the model predicts that 3.5% of the Cornell community will get infected by the end of the semester (again, assuming strict test-trace-isolate fully implemented and complied).  Under the pessimistic scenario, the proportion jumps to 14%, which is 4 or 5 times higher than the base case. In this worst-case scenario, if the daily contact rate were about twice the assumed value (just over 16), half of the community would be infected in 16 weeks!

I actually do not understand how there could only be 8 contacts per person per day when the entire student body has returned to 100% in-person instruction. (In the report, they even say the 8 contacts could include multiple contacts with the same person.) I imagine an undergrad student in a single classroom with 50 students. This assumption says the average student in this class only comes into contact with at most 8 of those. That's one class. How about other classes? small tutorials? dining halls? dorms? extracurricular activities? sports? parties? bars?

Back to graphics. Something about the canonical chart irked the report writers so they decided to try a log scale. Here is the same chart with the vertical axis in log scale:

Redo-cornellreopen-logy

The log transform produces a visual distortion. On the right side, where the three lines are diverging rapidly, the log transform pulls them together. On the left side, where the three lines are close together, the log transform pulls them apart.

Recall that on the log scale, a straight line is exponential growth. Look at the green line (worst case). That line is approximately linear so in the pessimistic scenario, despite assuming full compliance to a strict test-trace-isolate regimen, the cases are projected to grow exponentially.

Something about that last chart still irked the report writers so they decided to birth a second log scale. Here is the chart they ultimately settled on:

Redo-cornellreopen-logylogx

As with the other axis, the effect of the log transform is to squeeze the larger values (on the right side) and spread out the smaller values (on the left side). After this cosmetic surgery, the left side looks relatively flat while the right side looks steep.

In the next version of the Cornell report, they should replace all these charts with ones using linear scales.

***

Upon discovering this graphical mischief, I wonder if the research team received a mandate that includes a desired outcome.

 

[P.S. 7/8/2020. For more on the Cornell model, see this post.]


What is the price for objectivity

I knew I had to remake this chart.

TMC_hospitalizations

The simple message of this chart is hidden behind layers of visual complexity. What the analyst wants readers to focus on (as discerned from the text on the right) is the red line, the seven-day moving average of new hospital admissions due to Covid-19 in Texas.

My eyes kept wandering away from the line. It's the sideway data labels on the columns. It's the columns that take up vastly more space than the red line. It's the sideway date labels on the horizontal axis. It's the redundant axis labels for hospitalizations when the entire data set has already been printed. It's the two hanging diamonds, for which the clues are filed away in the legend above.

Here's a version that brings out the message: after Phase 2 re-opening, the number of hospital admissions has been rising steadily.

Redo_junkcharts_texas_covidhospitaladmissions_1

Dots are used in place of columns, which push these details to the background. The line as well as periods of re-opening are directly labeled, removing the need for a legend.

Here's another visualization:

Redo_junkcharts_texas_covidhospitaladmissions_2

This chart plots the weekly average new hospital admissions, instead of the seven-day moving average. In the previous chart, the raggedness of moving average isn't transmitting any useful information to the average reader. I believe this weekly average metric is easier to grasp for many readers while retaining the general story.

***

On the original chart by TMC, the author said "the daily hospitalization trend shows an objective view of how COVID-19 impacts hospital systems." Objectivity is an impossible standard for any kind of data analysis or visualization. As seen above, the two metrics for measuring the trend in hospitalizations have pros and cons. Even if one insists on using a moving average, there are choices of averaging methods and window sizes.

Scientists are trained to believe in objectivity. It frequently disappoints when we discover that the rest of the world harbors no such notion. If you observe debates between politicians or businesspeople or social scientists, you rarely hear anyone claim one analysis is more objective - or less subjective - than another. The economist who predicts Dow to reach a new record, the business manager who argues for placing discounted products in the front not the back of the store, the sportscaster who maintains Messi is a better player than Ronaldo: do you ever hear these people describe their methods as objective?

Pursuing objectivity leads to the glorification of data dumps. The scientist proclaims disinterest in holding an opinion about the data. This is self-deception though. We clearly have opinions because when someone else  "misinterprets" the data, we express dismay. What is the point of pretending to hold no opinions when most of the world trades in opinions? By being "objective," we never shape the conversation, and forever play defense.


Hope and reality in one Georgia chart

Over the weekend, Georgia's State Health Department agitated a lot of people when it published the following chart:

Georgia_top5counties_covid19

(This might have appeared a week ago as the last date on the chart is May 9 and the title refers to "past 15 days".)

They could have avoided the embarrassment if they had read my article at DataJournalism.com (link). In that article, I lay out a set of the "unspoken conventions," things that visual designers are, or should be, doing more or less in their sleep. Under the section titled "Order", I explain the following two "rules":

  • Place values in the natural order when it is available
  • Retain the same order across all plots in a panel of charts

In the chart above, the natural order for the horizontal (time) axis is time running left to right. The order chosen by the designer  is roughly but not precisely decreasing height of the tallest column in each daily group. Many observers suggested that the columns were arranged to give the appearance of cases dropping over time.

Within each day, the counties are ordered in decreasing number of new cases. The title of the chart reads "number of cases over time" which sounds like cumulative cases but it's not. The "lead" changed hands so many times over the 15 days, meaning the data sequence was extremely noisy, which would be unlikely for cumulative cases. There are thousands of cases in each of these counties by May. Switching the order of the columns within each daily group defeats the purpose of placing these groups side-by-side.

Responding to the bad press, the department changed the chart design for this week's version:

Georgia_top5counties_covid19_revised

This chart now conforms to the two spoken rules described above. The time axis runs left to right, and within each group of columns, the order of the counties is maintained.

The chart is still very noisy, with no apparent message.

***

Next, I'd like to draw your attention to a Data issue. Notice that the 15-day window has shifted. This revised chart runs from May 2 to May 16, which is this past Saturday. The previous chart ran from Apr 26 to May 9. 

Here's the data for May 8 and 9 placed side by side.

Junkcharts_georgia_covid19_cases

There is a clear time lag of reporting cases in the State of Georgia. This chart should always exclude the last few days. The case counts keep going up until it stabilizes. The same mistake occurs in the revised chart - the last two days appear as if new cases have dwindled toward zero when in fact, it reflects a lag in reporting.

The disconnect between the Question being posed and the quality of the Data available dooms this visualization. It is not possible to provide a reliable assessment of the "past 15 days" when during perhaps half of that period, the cases are under-counted.

***

Nyt_tryingtobefashionableThis graphical distortion due to "immature" data has become very commonplace in Covid-19 graphics. It's similar to placing partial-year data next to full-year results, without calling out the partial data.

The following post from the ancient past (2005!) about a New York Times graphic shows that calling out this data problem does not actually solve it. It's a less-bad kind of thing.

The coronavirus data present more headaches for graphic designers than the financial statistics. Because of accounting regulations, we know that only the current quarter's data are immature. For Covid-19 reporting, the numbers are being adjusted for days and weeks.

Practically all immature counts are under-estimates. Over time, more cases are reported. Thus, any plots over time - if unadjusted - paint a misleading picture of declining counts. The effect of the reporting lag is predictable, having a larger impact as we run from left to right in time. Thus, even if the most recent data show a downward trend, it can eventually mean anything: down, flat or up. This is not random noise though - we know for certain of the downward bias; we just don't know the magnitude of the distortion for a while.

Another issue that concerns coronavirus reporting but not financial reporting is inconsistent standards across counties. Within a business, if one were to break out statistics by county, the analysts would naturally apply the same counting rules. For Covid-19 data, each county follows its own set of rules, not just  how to count things but also how to conduct testing, and so on.

Finally, with the politics of re-opening, I find it hard to trust the data. Reported cases are human-driven data - by changing the number of tests, by testing different mixes of people, by delaying reporting, by timing the revision of older data, by explicit manipulation, ...., the numbers can be tortured into any shape. That's why it is extremely important that the bean-counters are civil servants, and that politicians are kept away. In the current political environment, that separation between politics and statistics has been breached.

***

Why do we have low-quality data? Human decisions, frequently political decisions, adulterate the data. Epidemiologists are then forced to use the bad data, because that's what they have. Bad data lead to bad predictions and bad decisions, or if the scientists account for the low quality, predictions with high levels of uncertainty. Then, the politicians complain that predictions are wrong, or too wide-ranging to be useful. If they really cared about those predictions, they could start by being more transparent about reporting and more proactive at discovering and removing bad accounting practices. The fact that they aren't focused on improving the data gives the game away. Here's a recent post on the politics of data.

 


This exercise plan for your lock-down work-out is inspired by Venn

A twitter follower did not appreciate this chart from Nature showing the collection of flu-like symptoms that people reported they have to an UK tracking app. 

Nature tracking app venn diagram

It's a super-complicated Venn diagram. I have written about this type of chart before (see here); it appears to be somewhat popular in the medicine/biology field.

A Venn diagram is not a data visualization because it doesn't plot the data.

Notice that the different compartments of the Venn diagram do not have data encoded in the areas. 

The chart also fails the self-sufficiency test because if you remove the data from it, you end up with a data container - like a world map showing country boundaries and no data.

If you're new here: if a graphic requires the entire dataset to be printed on it for comprehension, then the visual elements of the graphic are not doing any work. The graphic cannot stand on its own.

When the Venn diagram gets complicated, teeming with many compartments, there will be quite a few empty compartments. If I have to make this chart, I'd be nervous about leaving out a number or two by accident. An empty cell can be truly empty or an oversight.

Another trap is that the total doesn't add up. The numbers on this graphic add to 1,764 whereas the study population in the preprint was 1,702. Interestingly, this diagram doesn't show up in the research paper. Given how they winnowed down the study population from all the app downloads, I'm sure there is an innocent explanation as to why those two numbers don't match.

***

The chart also strains the reader. Take the number 18, right in the middle. What combination of symptoms did these 18 people experience? You have to figure out the layers sitting beneath the number. You see dark blue, light blue, orange. If you blink, you might miss the gray at the bottom. Then you have to flip your eyes up to the legend to map these colors to diarrhoea, shortness of breath, anosmia, and fatigue. Oops, I missed the yellow, which is the cough. To be sure, you look at the remaining categories to see where they stand - I've named all of them except fever. The number 18 lies outside fever so this compartment represents everything except fever. 

What's even sadder is there is not much gain from having done it once. Try to interpret the number 50 now. Maybe I'm just slow but it doesn't get better the second or third time around. This graphic not only requires work but painstaking work!

Perhaps a more likely question is how many people who had a loss of smell also had fever. Now it's pretty easy to locate the part of the dark gray oval that overlaps with the orange oval. But now, I have to add all those numbers, 69+17+23+50+17+46 = 222. That's not enough. Next, I must find the total of all the numbers inside the orange oval, which is 222 plus what is inside the orange and outside the dark gray. That turns out to be 829. So among those who had lost smell, the proportion who also had fever is 222/(222+829) = 21 percent. 

How many people had three or more symptoms? I'll let you figure this one out!

 

 

 

 

 

 

 


Reviewing the charts in the Oxford Covid-19 study

On my sister (book) blog, I published a mega-post that examines the Oxford study that was cited two weeks ago as a counterpoint to the "doomsday" Imperial College model. These studies bring attention to the art of statistical modeling, and those six posts together are designed to give you a primer, and you don't need math to get a feel.

One aspect that didn't make it to the mega-post is the data visualization. Sad to say, the charts in the Oxford study (link) are uniformly terrible. Figure 3 is typical:

Oxford_covidmodel_fig3

There are numerous design decisions that frustrate readers.

a) The graphic contains two charts, one on top of the other. The left axis extends floor-to-ceiling, giving the false impression that it is relevant to both charts. In fact, the graphic uses dual axes. The bottom chart references the axis shown in the bottom right corner; the left axis is meaningless. The two charts should be drawn separately.

For those who have not read the mega-post about the Oxford models, let me give a brief description of what these charts are saying. The four colors refer to four different models - these models have the same structure but different settings. The top chart shows the proportion of the population that is still susceptible to infection by a certain date. In these models, no one can get re-infected, and so you see downward curves. The bottom chart displays the growth in deaths due to Covid-19. The first death in the UK was reported on March 5.  The black dots are the official fatalities.

b) The designer allocates two-thirds of the space to the top chart, which has a much simpler message. This causes the bottom chart to be compressed beyond cognition.

c) The top chart contains just five lines, smooth curves of the same shape but different slopes. The designer chose to use thick colored lines with black outlines. As a result, nothing precise can be read from the chart. When does the yellow line start dipping? When do the two orange lines start to separate?

d) The top chart should have included margins of error. These models are very imprecise due to the sparsity of data.

e) The bottom chart should be rejected by peer reviewers. We are supposed to judge how well each of the five models fits the cumulative death counts. But three design decisions conspire to prevent us from getting the answer: (i) the vertical axis is severely compressed by tucking this chart underneath the top chart (ii) the vertical axis uses a log scale which compresses large values and (iii) the larger-than-life dots.

As I demonstrated in this post also from the sister blog, many models especially those assuming an exponential growth rate has poor fits after the first few days. Charting in log scale hides the degree of error.

f) There is a third chart squeezed into the same canvass. Notice the four little overlapping hills located around Feb 1. These hills are probability distributions, which are presented without an appropriate vertical axis. Each hill represents a particular model's estimate of the date on which the novel coronavirus entered the UK. But that date is unknowable. So the model expresses this uncertainty using a probability distribution. The "peak" of the distribution is the most likely date. The spread of the hill gives the range of plausible dates, and the height at a given date indicates the chance that that is the date of introduction. The missing axis is a probability scale, which is neither the left nor the right axis.

***

The bottom chart shows up in a slightly different form as Figure 1(A).

Oxford_covidmodels_Fig1A

Here, the green, gray (blocked) and red thick lines correspond to the yellow/orange/red diamonds in Figure 3. The thin green and red lines show the margins of error I referred to above (these lines are not explicitly explained in the chart annotation.) The actual counts are shown as white rather than black diamonds.

Again, the thick lines and big diamonds conspire to swamp the gaps between model fit and actual data. Again, notice the use of a log scale. This means that the same amount of gap signifies much bigger errors as time moves to the right.

When using the log scale, we should label it using the original units. With a base 10 logarithm, the axis should have labels 1, 10, 100, 1000 instead of 0, 1, 2, 3. (This explains my previous point - why small gaps between a model line and a diamond can mean a big error as the counts go up.)

Also notice how the line of white diamonds makes it impossible to see what the models are doing prior to March 5, the date of the first reported death. The models apparently start showing fatalities prior to March 5. This is a key part of their conclusion - the Oxford team concluded that the coronavirus has been circulating in the U.K. even before the first infection was reported. The data visualization should therefore bring out the difference in timing.

I hope by the time the preprint is revised, the authors will have improved the data visualization.

 

 

 


Comparing chance of death of coronavirus and flu

The COVID-19 charts are proving one thing. When the topic of a dataviz is timely and impactful, readers will study the graphics and ask questions. I've been sent some of these charts lately, and will be featuring them here.

A former student saw this chart from Business Insider (link) and didn't like it.

Businesinsider_coronavirus_flu_compare

My initial reaction was generally positive. It's clear the chart addresses a comparison between death rates of the flu and COVID19, an important current question. The side-by-side panel is effective at allowing such a comparison. The column charts look decent, and there aren't excessive gridlines.

Sure, one sees a few simple design fixes, like removing the vertical axis altogether (since the entire dataset has already been printed). I'd also un-slant the age labels.

***

I'd like to discuss some subtler improvements.

A primary challenge is dealing with the different definitions of age groups across the two datasets. While the side-by-side column charts prompt readers to go left-right, right-left in comparing death rates, it's not easy to identify which column to compare to which. This is not fixable in the datasets because the organizations that compile them define their own age groups.

Also, I prefer to superimpose the death rates on the same chart, using something like a dot plot rather than a column chart. This makes the comparison even easier.

Here is a revised visualization:

Redo_businessinsider_covid19fatalitybyage

The contents of this chart raise several challenges to public health officials. Clearly, hospital resources should be preferentially offered to older patients. But young people could be spreading the virus among the community.

Caution is advised as the data for COVID19 suffers from many types of inaccuracies, as outlined here.


Gazing at petals

Reader Murphy pointed me to the following infographic developed by Altmetric to explain their analytics of citations of journal papers. These metrics are alternative in that they arise from non-academic media sources, such as news outlets, blogs, twitter, and reddit.

The key graphic is the petal diagram with a number in the middle.

Altmetric_tetanus

I have a hard time thinking of this object as “data visualization”. Data visualization should visualize the data. Here, the connection between the data and the visual design is tenuous.

There are eight petals arranged around the circle. The legend below the diagram maps the color of each petal to a source of data. Red, for example, represents mentions in news outlets, and green represents mentions in videos.

Each petal is the same size, even though the counts given below differ. So, the petals are like a duplicative legend.

The order of the colors around the circle does not align with its order in the table below, for a mysterious reason.

Then comes another puzzle. The bluish-gray petal appears three times in the diagram. This color is mapped to tweets. Does the number of petals represent the much higher counts of tweets compared to other mentions?

To confirm, I pulled up the graphic for a different paper.

Altmetric_worldwidedeclineofentomofauna

Here, each petal has a different color. Eight petals, eight colors. The count of tweets is still much larger than the frequencies of the other sources. So, the rule of construction appears to be one petal for each relevant data source, and if the total number of data sources fall below eight, then let Twitter claim all the unclaimed petals.

A third sample paper confirms this rule:

Altmetric_dnananodevices

None of the places we were hoping to find data – size of petals, color of petals, number of petals – actually contain any data. Anything the reader wants to learn can be directly read. The “score” that reflects the aggregate “importance” of the corresponding paper is found at the center of the circle. The legend provides the raw data.

***

Some years ago, one of my NYU students worked on a project relating to paper citations. He eventually presented the work at a conference. I featured it previously.

Michaelbales_citationimpact

Notice how the visual design provides context for interpretation – by placing each paper/researcher among its peers, and by using a relative scale (percentiles).

***

I’m ignoring the D corner of the Trifecta Checkup in this post. For any visualization to be meaningful, the data must be meaningful. The type of counting used by Altmetric treats every tweet, every mention, etc. as a tally, making everything worth the same. A mention on CNN counts as much as a mention by a pseudonymous redditor. A pan is the same as a rave. Let’s not forget the fake data menace (link), which  affects all performance metrics.


How to read this cost-benefit chart, and why it is so confusing

Long-time reader Antonio R. found today's chart hard to follow, and he isn't alone. It took two of us multiple emails and some Web searching before we think we "got it".

Ar_submit_Fig-3-2-The-policy-cost-curve-525

 

Antonio first encountered the chart in a book review (link) of Hal Harvey et. al, Designing Climate Solutions. It addresses the general topic of costs and benefits of various programs to abate CO2 emissions. The reviewer praised the "wealth of graphics [in the book] which present complex information in visually effective formats." He presented the above chart as evidence, and described its function as:

policy-makers can focus on the areas which make the most difference in emissions, while also being mindful of the cost issues that can be so important in getting political buy-in.

(This description is much more informative than the original chart title, which states "The policy cost curve shows the cost-effectiveness and emission reduction potential of different policies.")

Spend a little time with the chart now before you read the discussion below.

Warning: this is a long read but well worth it.

 

***

 

If your experience is anything like ours, scraps of information flew at you from different parts of the chart, and you had a hard time piecing together a story.

What are the reasons why this data graphic is so confusing?

Everyone recognizes that this is a column chart. For a column chart, we interpret the heights of the columns so we look first at the vertical axis. The axis title informs us that the height represents "cost effectiveness" measured in dollars per million metric tons of CO2. In a cost-benefit sense, that appears to mean the cost to society of obtaining the benefit of reducing CO2 by a given amount.

That's how far I went before hitting the first roadblock.

For environmental policies, opponents frequently object to the high price of implementation. For example, we can't have higher fuel efficiency in cars because it would raise the price of gasoline too much. Asking about cost-effectiveness makes sense: a cost-benefit trade-off analysis encapsulates the something-for-something principle. What doesn't follow is that the vertical scale sinks far into the negative. The chart depicts the majority of the emissions abatement programs as having negative cost effectiveness.

What does it mean to be negatively cost-effective? Does it mean society saves money (makes a profit) while also reducing CO2 emissions? Wouldn't those policies - more than half of the programs shown - be slam dunks? Who can object to programs that improve the environment at no cost?

I tabled that thought, and proceeded to the horizontal axis.

I noticed that this isn't a standard column chart, in which the width of the columns is fixed and uneventful. Here, the widths of the columns are varying.

***

In the meantime, my eyes are distracted by the constellation of text labels. The viewing area of this column chart is occupied - at least 50% - by text. These labels tell me that each column represents a program to reduce CO2 emissions.

The dominance of text labels is a feature of this design. For a conventional column chart, the labels are situated below each column. Since the width does not usually carry any data, we tend to keep the columns narrow - Tufte, ever the minimalist, has even advocated reducing columns to vertical lines. That leaves insufficient room for long labels. Have you noticed that government programs hold long titles? It's tough to capture even the outline of a program with fewer than three big words, e.g. "Renewable Portfolio Standard" (what?).

The design solution here is to let the column labels run horizontally. So the graphical element for each program is a vertical column coupled with a horizontal label that invades the territories of the next few programs. Like this:

Redo_fueleconomystandardscars

The horror of this design constraint is fully realized in the following chart, a similar design produced for the state of Oregon (lifted from the Plan Washington webpage listed as a resource below):

Figure 2 oregon greenhouse

In a re-design, horizontal labeling should be a priority.

 

***

Realizing that I've been distracted by the text labels, back to the horizontal axis I went.

This is where I encountered the next roadblock.

The axis title says "Average Annual Emissions Abatement" measured in millions metric tons. The unit matches the second part of the vertical scale, which is comforting. But how does one reconcile the widths of columns with a continuous scale? I was expecting each program to have a projected annual abatement benefit, and those would fall as dots on a line, like this:

Redo_abatement_benefit_dotplot

Instead, we have line segments sitting on a line, like this:

Redo_abatement_benefit_bars_end2end_annuallabel

Think of these bars as the bottom edges of the columns. These line segments can be better compared to each other if structured as a bar chart:

Redo_abatement_benefit_bars

Instead, the design arranges these lines end-to-end.

To unravel this mystery, we go back to the objective of the chart, as announced by the book reviewer. Here it is again:

policy-makers can focus on the areas which make the most difference in emissions, while also being mindful of the cost issues that can be so important in getting political buy-in.

The primary goal of the chart is a decision-making tool for policy-makers who are evaluating programs. Each program has a cost and also a benefit. The cost is shown on the vertical axis and the benefit is shown on the horizontal. The decision-maker will select some subset of these programs based on the cost-benefit analysis. That subset of programs will have a projected total expected benefit (CO2 abatement) and a projected total cost.

By stacking the line segments end to end on top of the horizontal axis, the chart designer elevates the task of computing the total benefits of a subset of programs, relative to the task of learning the benefits of any individual program. Thus, the horizontal axis is better labeled "Cumulative annual emissions abatement".

 

Look at that axis again. Imagine you are required to learn the specific benefit of program titled "Fuel Economy Standards: Cars & SUVs".  

Redo_abatement_benefit_bars_end2end_cumlabel

This is impossible to do without pulling out a ruler and a calculator. What the axis labels do tell us is that if all the programs to the left of Fuel Economy Standards: Cars & SUVs were adopted, the cumulative benefits would be 285 million metric tons of CO2 per year. And if Fuel Economy Standards: Cars & SUVs were also implemented, the cumulative benefits would rise to 375 million metric tons.

***

At long last, we have arrived at a reasonable interpretation of the cost-benefit chart.

Policy-makers are considering throwing their support behind specific programs aimed at abating CO2 emissions. Different organizations have come up with different ways to achieve this goal. This goal may even have specific benchmarks; the government may have committed to an international agreement, for example, to reduce emissions by some set amount by 2030. Each candidate abatement program is evaluated on both cost and benefit dimensions. Benefit is given by the amount of CO2 abated. Cost is measured as a "marginal cost," the amount of dollars required to achieve each million metric ton of abatement.

This "marginal abatement cost curve" aids the decision-making. It lines up the programs from the most cost-effective to the least cost-effective. The decision-maker is presumed to prefer a more cost-effective program than a less cost-effective program. The chart answers the following question: for any given subset of programs (so long as we select them left to right contiguously), we can read off the cumulative amount of CO2 abated.

***

There are still more limitations of the chart design.

  • We can't directly read off the cumulative cost of the selected subset of programs because the vertical axis is not cumulative. The cumulative cost turns out to be the total area of all the columns that correspond to the selected programs. (Area is height x width, which is cost per benefit multiplied by benefit, which leaves us with the cost.) Unfortunately, it takes rulers and calculators to compute this total area.

  • We have presumed that policy-makers will make the Go-No-go decision based on cost effectiveness alone. This point of view has already been contradicted. Remember the mystery around negatively cost-effective programs - their existence shows that some programs are stalled even when they reduce emissions in addition to making money!

  • Since many, if not most, programs have negative cost-effectiveness (by the way they measured it), I'd flip the metric over and call it profitability (or return on investment). Doing so removes another barrier to our understanding. With the current cost-effectiveness metric, policy-makers are selecting the "negative" programs before the "positive" programs. It makes more sense to select the "positive" programs before the "negative" ones!

***

In a Trifecta Checkup (guide), I rate this chart Type V. The chart has a great purpose, and the design reveals a keen sense of the decision-making process. It's not a data dump for sure. In addition, an impressive amount of data gathering and analysis - and synthesis - went into preparing the two data series required to construct the chart. (Sure, for something so subjective and speculative, the analysis methodology will inevitably be challenged by wonks.) Those two data series are reasonable measures for the stated purpose of the chart.

The chart form, though, has various shortcomings, as shown here.  

***

In our email exchange, Antonio and I found the Plan Washington website useful. This is where we learned that this chart is called the marginal abatement cost curve.

Also, the consulting firm McKinsey is responsible for popularizing this chart form. They have published this long report that explains even more of the analysis behind constructing this chart, for those who want further details.


Choosing between individuals and aggregates

Friend/reader Thomas B. alerted me to this paper that describes some of the key chart forms used by cancer researchers.

It strikes me that many of the "new" charts plot granular data at the individual level. This heatmap showing gene expressions show one column per patient:

Jnci_genemap

This so-called swimmer plot shows one bar per patient:

Jnci_swimlanes

This spider plot shows the progression of individual patients over time. Key events are marked with symbols.

Jnci_spaghetti

These chart forms are distinguished from other ones that plot aggregated statistics: statistical averages, medians, subgroup averages, and so on.

One obvious limitation of such charts is their lack of scalability. The number of patients, the variability of the metric, and the timing of trends all drive up the amount of messiness.

I am left wondering what Question is being addressed by these plots. If we are concerned about treatment of an individual patient, then showing each line by itself would be clearer. If we are interested in the average trends of patients, then a chart that plots the overall average, or subgroup averages would be more accurate. If the interpretation of the individual's trend requires comparing with similar patients, then showing that individual's line against the subgroup average would be preferred.

When shown these charts of individual lines, readers are tempted to play the statistician - without using appropriate tools! Readers draw aggregate conclusions, performing the aggregation in their heads.

The authors of the paper note: "Spider plots only provide good visual qualitative assessment but do not allow for formal statistical inference." I agree with the second part. The first part is a fallacy - if the visual qualitative assessment is good enough, then no formal inference is necessary! The same argument is often made when people say they don't need advanced analysis because their simple analysis is "directionally accurate". When is something "directionally inaccurate"? How would one know?

Reference: Chia, Gedye, et. al., "Current and Evolving Methods to Visualize Biological Data in Cancer Research", JNCI, 2016, 108(8). (link)

***

Meteoreologists, whom I featured in the previous post, also have their own spider-like chart for hurricanes. They call it a spaghetti map:

Dorian_spaghetti

Compare this to the "cone of uncertainty" map that was featured in the prior post:

AL052019_5day_cone_with_line_and_wind

These two charts build upon the same dataset. The cone map, as we discussed, shows the range of probable paths of the storm center, based on all simulations of all acceptable models for projection. The spaghetti map shows selected individual simulations. Each line is the most likely trajectory of the storm center as predicted by a single simulation from a single model.

The problem is that each predictive model type has its own historical accuracy (known as "skill"), and so the lines embody different levels of importance. Further, it's not immediately clear if all possible lines are drawn so any reader making conclusions of, say, the envelope containing x percent of these lines is likely to be fooled. Eyeballing the "cone" that contains x percent of the lines is not trivial either. We tend to naturally drift toward aggregate statistical conclusions without the benefit of appropriate tools.

Plots of individuals should be used to address the specific problem of assessing individuals.