Where are the Democratic donors?

I like Alberto's discussion of the attractive maps about donors to Democratic presidential candidates, produced by the New York Times (direct link).

Here is the headline map:

Nyt_demdonormaps

The message is clear: Bernie Sanders is the only candidate with nation-wide appeal. The breadth of his coverage is breath-taking. (I agree with Alberto's critique about the lack of a color scale. It's impossible to know if the counts are trivial or not.)

Bernie's coverage is so broad that his numbers overwhelm those of all other candidates except in their home bases (e.g. O'Rourke in Texas).

A remedy to this is to look at the data after removing Bernie's numbers.

Nyt_demdonormap_2

 

This pair of maps reminds me of the Sri Lanka religions map that I revisualized in this post.

Redo_srilankareligiondistricts_v2

The first two maps divide the districts into those in which one religion dominates and those in which multiple religions share the limelight. The third map then shows the second-rank religion in the mixed-religions districts.

The second map in the NYT's donor map series plots the second-rank candidate in all the precincts that Bernie Sanders lead. It's like the designer pulled off the top layer (blue: Bernie) to reveal what's underneath.

Because all of Bernie's data are removed, O'Rourke is still dominating Texas, Buttigieg in Indiana, etc. An alternative is to pull off the top layer in those pockets as well. Then, it's likely to see Bernie showing up in those areas.

The other startling observation is how small Joe Biden's presence is on these maps. This is likely because Biden relies primarily on big donors.

See here for the entire series of donor maps. See here for past discussion of New York Times's graphics.


Powerful photos visualizing housing conditions in Hong Kong

I was going to react to Alberto's post about the New York Times's article about economic inequality in Hong Kong, which is proposed as one origin to explain the current protest movement. I agree that the best graphic in this set is the "photoviz" showing the "coffins" or "cages" that many residents live in, because of the population density. 

Nyt_hongkong_apartment_photoviz

Then I searched the archives, and found this old post from 2015 which is the perfect response to it. What's even better, that post was also inspired by Alberto.

The older post featured a wonderful campaign by human rights organization Society for Community Organization that uses photoviz to draw attention to the problem of housing conditions in Hong Kong. They organized a photography exhibit on this theme in 2014. They then updated the exhibit in 2016.

Here is one of the iconic photos by Benny Lam:

Soco_trapped_B1

I found more coverage of Benny's work here. There is also a book that we can flip on Vimeo.

In 2017, the South China Morning Post (SCMP) published drone footage showing the outside view of the apartment buildings.

***

What's missing is the visual comparison to the luxury condos where the top 1 percent live. For these, one can  visit the real estate sites, such as Sotheby's. Here is their "12 luxury homes for sales" page.

Another comparison: a 1000 sq feet apartment that sits between those extremes. The photo by John Butlin comes from SCMP's Post Magazine's feature on the apartment:

Butlin_scmp_home

***

Also check out my review of Alberto's fantastic, recent book, How Charts Lie.

Cairo_howchartslie_cover

 

 


Say it thrice: a nice example of layering and story-telling

I enjoyed the New York Times's data viz showing how actively the Democratic candidates were criss-crossing the nation in the month of March (link).

It is a great example of layering the presentation, starting with an eye-catching map at the most aggregate level. The designers looped through the same dataset three times.

Nyt_candidatemap_1

This compact display packs quite a lot. We can easily identify which were the most popular states; and which candidate visited which states the most.

I noticed how they handled the legend. There is no explicit legend. The candidate names are spread around the map. The size legend is also missing, replaced by a short sentence explaining that size encodes the number of cities visited within the state. For a chart like this, having a precise size legend isn't that useful.

The next section presents the same data in a small-multiples layout. The heads are replaced by dots.

Nyt_candidatemap_2

This allows more precise comparison of one candidate to another, and one location to another.

This display has one shortcoming. If you compare the left two maps above, those for Amy Klobuchar and Beto O'Rourke, it looks like they have visited roughly similar number of cities when in fact Beto went to 42 compared to 25. Reducing the size of the dots might work.

Then, in the third visualization of the same data, the time dimension is emphasized. Lines are used to animate the daily movements of the candidates, one by one.

Nyt_candidatemap_3

Click here to see the animation.

When repetition is done right, it doesn't feel like repetition.

 


Quick example of layering

The New York Times uses layering to place the Alabama tornadoes in context. (link)

Today's wide availability of detailed data allows designers to create dense data graphics like this:

Nyt_alabamatornado_3

The graphic shows the starting and ending locations and trajectory of each tornado, as well as the wind speeds (shown in color).

Too much data slows down our understanding of the visual message. The remedy is to subtract. Here is a second graphic that focuses only on the strongest tornadoes (graded 4 or 5 on a 5-point scale):

Nyt_alabamatornado_2

Another goal of the data visualization is to place in context the tornado that hit Beauregard:

Nyt_alabamatornado_1

The area around Beauregard is not typically visited by strong tornadoes. Also, the tornadoes were strong but there have been stronger ones.

***

The designer unfolds the story in three stages. There are no knobs and sliders and arrows, and that's a beauty. It's usually not a good idea to make readers find the story themselves.


NYT hits the trifecta with this market correction chart

Yesterday, in the front page of the Business section, the New York Times published a pair of charts that perfectly captures the story of the ongoing turbulence in the stock market.

Here is the first chart:

Nyt_marketcorrection_1

Most market observers are very concerned about the S&P entering "correction" territory, which the industry arbitrarily defines as a drop of 10% or more from a peak. This corresponds to the shortest line on the above chart.

The chart promotes a longer-term reflection on the recent turbulence, using two reference points: the index has returned to the level even with that at the start of 2018, and about 16 percent higher since the beginning of 2017.

This is all done tastefully in a clear, understandable graphic.

Then, in a bit of a rhetorical flourish, the bottom of the page makes another point:

Myt_marketcorrection2

When viewed back to a 10-year period, this chart shows that the S&P has exploded by 300% since 2009.

A connection is made between the two charts via the color of the lines, plus the simple, effective annotation "Chart above".

The second chart adds even more context, through vertical bands indicating previous corrections (drops of at least 10%). These moments are connected to the first graphic via the beige color. The extra material conveys the message that the market has survived multiple corrections during this long bull period.

Together, the pair of charts addresses a pressing current issue, and presents a direct, insightful answer in a simple, effective visual design, so it hits the Trifecta!

***

There are a couple of interesting challenges related to connecting plots within a multiple-plot framework.

While the beige color connects the concept of "market correction" in the top and bottom charts, it can also be a source of confusion. The orientation and the visual interpretation of those bands differ. The first chart uses one horizontal band while the chart below shows multiple vertical bands. In the first chart, the horizontal band refers to a definition of correction while in the second chart, the vertical bands indicate experienced corrections.

Is there a solution in which the bands have the same orientation and same meaning?

***

These graphs solve a visual problem concerning the visualization of growth over time. Growth rates are anchored to some starting time. A ten-percent reduction means nothing unless you are told ten-percent of what.

Using different starting times as reference points, one gets different values of growth rates. With highly variable series of data like stock prices, picking starting times even a day apart can lead to vastly different growth rates.

The designer here picked several obvious reference times, and superimposes multiple lines on the same plotting canvass. Instead of having four lines on one chart, we have three lines on one, and four lines on the other. This limits the number of messages per chart, which speeds up cognition.

The first chart depicts this visual challenge well. Look at the start of 2018. This second line appears as if you can just reset the start point to 0, and drag the remaining portion of the line down. The part of the top line (to the right of Jan 2018) looks just like the second line that starts at Jan 2018.

Jc_marketcorrection1

However, a closer look reveals that the shape may be the same but the magnitude isn't. There is a subtle re-scaling in addition to the re-set to zero.

The same thing happens at the starting moment of the third line. You can't just drag the portion of the first or second line down - there is also a needed re-scaling.


Crazy rich Asians inspire some rich graphics

On the occasion of the hit movie Crazy Rich Asians, the New York Times did a very nice report on Asian immigration in the U.S.

The first two graphics will be of great interest to those who have attended my free dataviz seminar (coming to Lyon, France in October, by the way. Register here.), as it deals with a related issue.

The first chart shows an income gap widening between 1970 and 2016.

Nyt_crazyrichasians_incomegap1

This uses a two-lines design in a small-multiples setting. The distance between the two lines is labeled the "income gap". The clear story here is that the income gap is widening over time across the board, but especially rapidly among Asians, and then followed by whites.

The second graphic is a bumps chart (slopegraph) that compares the endpoints of 1970 and 2016, but using an "income ratio" metric, that is to say, the ratio of the 90th-percentile income to the 10th-percentile income.

Nyt_crazyrichasians_incomeratio2

Asians are still a key story on this chart, as income inequality has ballooned from 6.1 to 10.7. That is where the similarity ends.

Notice how whites now appears at the bottom of the list while blacks shows up as the second "worse" in terms of income inequality. Even though the underlying data are the same, what can be seen in the Bumps chart is hidden in the two-lines design!

In short, the reason is that the scale of the two-lines design is such that the small numbers are squashed. The bottom 10 percent did see an increase in income over time but because those increases pale in comparison to the large incomes, they do not show up.

What else do not show up in the two-lines design? Notice that in 1970, the income ratio for blacks was 9.1, way above other racial groups.

Kudos to the NYT team to realize that the two-lines design provides an incomplete, potentially misleading picture.

***

The third chart in the series is a marvellous scatter plot (with one small snafu, which I'd get t0).

Nyt_crazyrichasians_byethnicity

What are all the things one can learn from this chart?

  • There is, as expected, a strong correlation between having college degrees and earning higher salaries.
  • The Asian immigrant population is diverse, from the perspectives of both education attainment and median household income.
  • The largest source countries are China, India and the Philippines, followed by Korea and Vietnam.
  • The Indian immigrants are on average professionals with college degrees and high salaries, and form an outlier group among the subgroups.

Through careful design decisions, those points are clearly conveyed.

Here's the snafu. The designer forgot to say which year is being depicted. I suspect it is 2016.

Dating the data is very important here because of the following excerpt from the article:

Asian immigrants make up a less monolithic group than they once did. In 1970, Asian immigrants came mostly from East Asia, but South Asian immigrants are fueling the growth that makes Asian-Americans the fastest-expanding group in the country.

This means that a key driver of the rapid increase in income inequality among Asian-Americans is the shift in composition of the ethnicities. More and more South Asian (most of whom are Indians) arrivals push up the education attainment and household income of the average Asian-American. Not only are Indians becoming more numerous, but they are also richer.

An alternative design is to show two bubbles per ethnicity (one for 1970, one for 2016). To reduce clutter, the smaller ethnicites can be aggregated into Other or South Asian Other. This chart may help explain the driver behind the jump in income inequality.

 

 

 

 

 


Visualizing the Thai cave rescue operation

The Thai cave rescue was a great story with a happy ending. It's also one that lends itself to visualization. A good visualization can explain the rescue operation more efficiently than mere words.

A good visual should bring out the most salient features of the story, such as:

  • Why the operation was so daunting?
  • What were the tactics used to overcome those challenges?
  • How long did it take?
  • What were the specific local challenges that must be overcome?
  • Were there any surprises?

In terms of what made the rescue challenging, some of the following are pertinent:

  • How far in they were?
  • How deep were they trapped?
  • How much of the caves were flooded? Why couldn't they come out by themselves?
  • How much headroom was there in different sections of the cave "tunnel"?

There were many attempts at visualizing the Thai cave rescue operation. The best ones I saw were: BBC (here, here), The New York Times (here), South China Morning Post (here) and Straits Times (here). It turns out each of these efforts focuses on some of the aspects above, and you have to look at all of them to get the full picture.

***

BBC's coverage began with a top-down view of the route of the rescue, which seems to be the most popular view adopted by news organizations. This is easily understood because of the standard map aesthetic.

Bbc_102494059_caves_976

The BBC map is missing a smaller map of Thailand to place this in a geographical context.

While this map provides basic information, it doesn't address many of the elements that make the Thai cave rescue story compelling. In particular, human beings are missing from this visualization. The focus is on the actions ("diving", "standing"). This perspective also does not address the water level, the key underlying environmental factor.

***

Another popular perspective is the sideway cross-section. The Straits Times has one:

Straittimes_thai rescue_part

The excerpt of the infographic presents a nice collection of data that show the effort of the rescue. The sideway cross-sectional section shows the distance and the up-and-down nature of the journey, the level of flooding along the route, plus a bit about the headroom available at different points. Most of these diagrams bring out the "horizontal" distance but somehow ignore the "vertical" distance. One possibility is that the real trajectory is curvy - but if we can straighten out the horizontal, we should be able to straighten out the vertical too.

The NYT article gives a more detailed view of the same perspective, with annotations that describe key moments along the rescue route.

Nyt_detailed_thairescueroute

If, like me, you like to place humans into this picture, then you have to go back to the Straits Times, where they have an expanded version of the sideway cross-section.

  Straitstimes_riskyroute_thairescue

This is probably my most favorite single visualization of the rescue operation.

There are better cartoons of the specific diving actions, though. For example, the BBC has this visual that shows the particularly narrow part of the route, corresponding to the circular inset in the Straits Times version above.

Bbc_thairescue_tightspace

The drama!

NYT also has a set of cartoons. Here's one:

Nyt_thairescue_divers

***

There is one perspective that curiously has been underserved in all of the visualizations - this is the first-person perspective. Imagine the rescuer (or the kids) navigating the rescue route. It's a cross-section from the front, not from the side.

Various publications try to address this by augmenting the top-down route view with sporadic cross-sectional diagrams. Recall the first map we showed from the BBC. On the right column are little annotations of this type (here):

Bbc_thaicaverescue_crosssection

I picked out this part of the map because it shows that the little human figure serves two potentially conflicting purposes. In the bottom diagram, the figurine shows that there is limited headroom in this part of the cave, plus the actual position of the figurine on the ledge conveys information about where the kids were. However, on the top cross-section, the location of the figure conveys no information; the only purpose of the human figure is to show how tall the cave is at that site.

The South China Morning Post (here - site appears to be down when I wrote this) has this wonderful animation of how the shape of the headroom changed as they navigated the route. Please visit their page to see the full animation. Here are two screenshots:

Scmp_caveshape_1

Scmp_caveshape_2

This little clip adds a lot to the story! It'd be even better if the horizontal timeline at the bottom is replaced by the top-down route map.

Thank you all the various dataviz teams for these great efforts.

 

 

 


Checking the scale on a chart

Dot maps, and by extension, bubble maps are popular options for spatial data; but the scale of these maps can be deceiving. Here is an example of a poorly-scaled dot map:

Farm-Dot Density

The U.S. was primarily an agrarian economy in 1997, if you believe your eyes.

Here is a poorly-scaled bubble map:

image from junkcharts.typepad.com

New Yorkers have all become Citibikers, if you believe what you see.

Last week, I saw a nice dot map embedded inside this New York Times Graphics feature on the destruction of the Syrian city of Raqqa.

Nyt_raqqa_dotmap

Before I conclude that the destruction was broadly felt, I'd like to check the scale on the map to make sure it doesn't have the problem seen above. What is helpful here is the scale provided on the map itself.

Nty_raqqa_scale

That line segment representing a quarter mile fits about 15 dots side by side. Then, I found out that a Manhattan avenue (longer) block is roughly a quarter mile. That means the map places about 15 buildings to an avenue block. In my experience, that sounds about right: I'd imagine 15-20 buildings per block.

So I'm convinced that the designer chose an appropriate scale to display the data. It is actually true that the entire city of Raqqa was pretty much annihilated by U.S. bombs.


Two nice examples of interactivity

Janie on Twitter pointed me to this South China Morning Post graphic showing off the mighty train line just launched between north China and London (!)

Scmp_chinalondonrail

Scrolling down the page simulates the train ride from origin to destination. Pictures of key regions are shown on the left column, as well as some statistics and other related information.

The interactivity has a clear purpose: facilitating cross-reference between two chart forms.

The graphic contains a little oversight ... The label for the key city of Xian, referenced on the map, is missing from the elevation chart on the left here:

Scmp_chinalondonrail_xian

 ***

I also like the way New York Times handled interactivity to this chart showing the rise in global surface temperature since the 1900s. The accompanying article is here.

Nyt_surfacetemp

When the graph is loaded, the dots get printed from left to right. That's an attention grabber.

Further, when the dots settle, some years sink into the background, leaving the orange dots that show the years without the El Nino effect. The reader can use the toggle under the chart title to view all of the years.

This configuration is unusual. It's more common to show all the data, and allow readers to toggle between subsets of the data. By inverting this convention, it's likely few readers need to hit that toggle. The key message of the story concerns the years without El Nino, and that's where the graphic stands.

This is interactivity that succeeds by not getting in the way. 

 

 

 


A look at how the New York Times readers look at the others

Nyt_taxcutmiddleclass

The above chart, when it was unveiled at the end of November last year, got some mileage on my Twitter feed so it got some attention. A reader, Eric N., didn't like it at all, and I think he has a point.

Here are several debatable design decisions.

The chart uses an inverted axis. A tax cut (negative growth) is shown on the right while a tax increase is shown on the left. This type of inversion has gotten others in trouble before, namely, the controversy over the gun deaths chart (link). The green/red color coding is used to signal the polarity although some will argue this is bad for color-blind readers. The annotation below the axis is probably the reason why I wasn't confused in the first place but the other charts further down the page do not repeat the annotation, and that's where the interpretation of -$2,000 as a tax increase is unnatural!

The chart does not aggregate the data. It plots 25,000 households with 25,000 points. Because of the variance of the data, it's hard to judge trends. It's easy enough to see that there are more green dots than red but how many more? 10 percent, 20 percent, 40 percent? It's also hard to answer any specific questions, say, about households with a certain range of incomes. There are various ways to aggregate the data, such as heatmaps, histograms, and so on.

For those used to looking at scientific charts, the x- and y-axes are reversed. By convention, we'd have put the income ranges on the horizontal axis and the tax changes (the "outcome" variable) on the vertical axis.

***

The text labels do not describe the data patterns on the chart so much as they offer additional information. To see this, remove the labels as I have done below. Try adding the labels based on what is shown on the chart.

Nyt_taxcutmiddleclass_2

Perhaps it's possible to illustrate those insights with a set of charts.

***

While reading this chart, I kept wondering how those 25,000 households were chosen. This is a sample of  households. The methodology is explained in a footnote, which describes the definition of "middle class" but unfortunately, they forgot to tell us how the 25,000 households were chosen from all such middle-class households.

Nyt_taxcutmiddleclass_footnote

The decision to omit the households with income below $40,000 needs more explanation as it usurps the household-size adjustment. Also, it's not clear that the impact of the tax bill on the households with incomes between $20-40K can be assumed the same as for those above $40K.

Are the 25,000 households is a simple random sample of all "middle class" households or are they chosen in some ways to represent the relative counts? It's also useful to know if they applied the $40K cutoff before or after selecting the 25,000 households. 

Ironically, the media kit of the Times discloses an affluent readership with median household income of almost $190K so it appears that the majority of readers are not represented in the graphic at all!