Area chart is not the solution

A reader left a link to a Wiki chart, which is ghastly:

House_Seats_by_State_1789-2020_Census

This chart concerns the trend of relative proportions of House representatives in the U.S. Congress by state, and can be found at this Wikipedia entry. The U.S. House is composed of Representatives, and the number of representatives is roughly proportional to each state's population. This scheme actually gives small states disporportional representation, since the lowest number of representatives is 1 while the total number of representatives is fixed at 435.

We can do a quick calculation: 1/435 = 0.23% so any state that has less than 0.23% of the population is over-represented in the House. Alaska, Vermont and Wyoming are all close to that level. The primary way in which small states get larger representation is via the Senate, which sits two senators per state no matter the size. (If you've wondered about Nate Silver's website: 435 Representatives + 100 Senators + 3 for DC = 538 electoral votes for U.S. Presidental elections.)

***

So many things have gone wrong with this chart. There are 50 colors for 50 states. The legend arranges the states by the appropriate metric (good) but in ascending order (bad). This is a stacked area chart, which makes it very hard to figure out the values other than the few at the bottom of the chart.

A nice way to plot this data is a tile map with line charts. I found a nice example that my friend Xan put together in 2018:

Xang_cdcflu_tilemap_lines

A tile map is a conceptual representation of the U.S. map in which each state is represented by equal-sized squares. The coordinates of the states are distorted in order to line up the tiles. A tile map is a small-multiples setup in which each square contains a chart of the same design to faciliate inter-state comparisons.

In the above map, Xan also takes advantage of the foregrounding concept. Each chart actually contains all 50 lines for every state, all shown in gray while the line for the specific state is bolded and shown in red.

***

A chart with 50 lines looks very different from one with 50 areas stacked on each other. California, the most populous state, has 12% of the total population so the line chart has 50 lines that will look like spaghetti. Thus, the fore/backgrounding is important to make sure it's readable.

I suspect that the designer chose a stacked area chart because the line chart looked like spaghetti. But that's the wrong solution. While the lines no longer overlap each other, it is a real challenge to figure out the state-level trends - one has to focus on the heights of the areas, rather than the boundary lines.

[P.S. 2/27/2023] As we like to say, a picture is worth a thousand words. Twitter reader with the handle LHZGJG made the tile map I described above. It looks like this:

Lhzgjg_redo_houseapportionment

You can pick out the states with the key changes really fast. California, Texas, Florida on the upswing, and New York, Pennsylvania going down. I like the fact that the state names are spelled out. Little tweaks are possible but this is a great starting point. Thanks LHZGJG! ]

 


The windy path to the Rugby World Cup

When I first saw the following chart, I wondered whether it is really that challenging for these eight teams to get into the Rugby World Cup, currently playing in Japan:

1920px-2019_Rugby_World_Cup_Qualifying_Process_Diagram.svg

Another visualization of the process conveys a similar message. Both of these are uploaded to Wikipedia.

Rugby_World_Cup_2019_Qualification_illustrated_v2

(This one hasn't been updated and still contains blank entries.)

***

What are some of the key messages one would want the dataviz to deliver?

  • For the eight countries that got in (not automatically), track their paths to the World Cup. How many competitions did they have to play?
  • For those countries that failed to qualify, track their paths to the point that they were stopped. How many competitions did they play?
  • What is the structure of the qualification rounds? (These are organized regionally, in addition to certain playoffs across regions.)
  • How many countries had a chance to win one of the eight spots?
  • Within each competition, how many teams participated? Did the winner immediately qualify, or face yet another hurdle? Did the losers immediately disqualify, or were they offered another chance?

Here's my take on this chart:

Rugby_path_to_world_cup_sm

 


Doing my duty on Pi Day #onelesspie

Xan Gregg and I started a #onelesspie campaign a few years ago. On Pi Day each year, we find a pie chart, and remake it. On Wikipedia, you can find all manners of pie chart. Try this search, and see for yourself.

Here's one found on the Wiki page about the city of Ogema, in Canada:

Ogema_Stats_canada_pie_chart

This chart has 20 age groups, each given a different color. That's way too much!

I was able to find data on 10-year age groups, not five. But the "shape" of the distribution is much easily seen on a column chart (a histogram).

Redo_ogema_age_distribution

Only a single color is needed.

The reason why I gravitated to this chart was the highly unusual age distribution... this town has almost uniform distribution of age groups, with each of the 10-year ranges accounting for about 11% of the population. Given that there are 9 groups, a perfectly even distribution would be 11% for each column. (Well, the last group of 80+ is cheating a bit as it has more than 10 years.)

I don't know about Ogema. Maybe a reader can explain this unusual age distribution!

 

 

 


Making the world a richer place #onelesspie #PiDay

Xan Gregg and I have been at it for a number of years. To celebrate Pi Day today, I am ridding the world of one pie chart.

Here is a pie chart that is found on Wikipedia:

Wiki_20_Largest_economies_pie_chart.pdf

Here is the revised chart:

Redo_worldeconomypie

It's been designed to highlight certain points of interest.

I find the data quite educational. These are some other insights that are not clear from the revised chart:

  • Japan's economy is larger than Germany's
  • Russia's economy is smaller than that of Germany, Italy, India, Brazil, or South Korea
  • China and Japan combined have GDP (probably) larger than Western Europe
  • Turkey, Netherlands, Switzerland, South Africa are in the Top 20

PS. Xan re-worked a radar chart this year. (link)

 

 


Spacing out on the space race

Jordan G. sent us to this Wikipedia image. (link)

Wiki_Space_Race_1957-1975_

Intriguing concept to try to show the tit-for-tat in the space race between US and USSR. But it's almost impossible to fish any information out of it. While the voluminous text turned sideways is annoying enough, I find the color scheme to be the most offensive. Would like to know who is the intended audience.


Ten parts don't make a whole

Reader Sigve I. thinks we should clean up Wikipedia. This is a good idea but would take up a lot of time. Some of our previous contributions include these entries.

In making this suggestion, Sigve sends us to the following chart about population growth (related to this entry):

World_Population_by_Continent_and_10_Most_Populated_Countries

 

The problems here are many. Starting with the detached chart titles: it takes a little while to realize that the graphical elements depict the share of population from 1950 to 2010 while the population growth is written in parentheses next to the legend while the third series of numbers displays the ranking -- not of growth, but of share of population -- among the continents or countries depicted.

That's quite a mouthful.

A forensic scientist is on call to tell us which software might have generated these charts. The telltale clue would be the padded "00.8%". This one can't be blamed on Excel since Excel always banish the padding (even if you deliberately put it there).

I won't mention the variety of chartjunk that serves no purpose. But I do want to point out that setting the year labels 15 years apart is wacky.

***
Now, let's zoom in on the bottom chart. "10 most populated countries" is the title. Why does the vertical axis display proportions that add up to 100%? Surely, these 10 parts don't add up to a whole!

Even though this is not a pie chart for which this state of confusion is fairly routine (unfortunately), as we've even stumbled on examples in teaching materials for (gasp) numeracy, the same error can show up in stacked column or area charts.

Take a step back. Apart from the obvious fact that China followed by India are the two most populous countries by far, what insight is being conveyed by this chart?

Next, consider the following version:

Redo_UNPopGrowCountry2 On this one, we notice that the top 10 countries fall into roughly three types in terms of their growth trajectory since 1950. The green group has a parabolic growth pattern, with a growth rate that reaches an apex in the front part of this period; these countries all have slowing growth in the most recent decades.

The black group, which includes biggies like China, Russia, Japan and Brazil, has by and large experienced slowing growth throughout the time window. They are still growing but the growth rate has been declining.

Finally, USA stands alone as a country where the growth rate has been generally stable over much of this period.

The other thing to notice is that while most countries had similar growth rates back in the 50s, by 2010 these countries experience a much wider range of growth.

One of the tricks that help surface these trends is the smoothing applied to the data. The real data, as you may suspect, would not fall neatly into parabolas. Just for comparison, below is the same chart without smoothing. Nothing is lost by smoothing while the result is significantly cleaner.

Redo_UNPopGrowCountry
***

Growth rate is not the only thing of note. By focusing on growth rates, one loses the important fact that countries with larger populations contribute more to the growth of world population. The following chart displays this trend. Risking the ire of some, I elected to lump almost all the countries into one group -- there are indeed differences among these countries in terms of their growth trajectories but one cannot escape the conclusion that these differences are only drops in a large bucket.

Redo_UNPopByCountry

 

Looks like Wikipedia needs some cleaning up. Who's pitching in? 


Organizing the bookshelves

When you go to the library, you expect to find the books in an organized fashion, typically sorted first by subject matter, then by author, then by title, and so on. Imagine the frustration when you walk in and discover that books are spread out everywhere with no discernible order. We are very particular about tidiness: it would still be terrible if the books were arranged by author and title without first splitting by subject matter. We are annoyed because it would take too long to find a book.

I did run into such an exasperating bookstore -- I believe it is in Brooklyn. The (used) books in this store are arranged by the date on which the owner acquired them. Fiction, I recall, is ordered by alphabets of last names, and then, say within the 'A' authors, the books were sorted by date of acquisition. What a headache!

Reader Pat L. had a big headache trying to figure out this chart, found on Wikipedia: (I'm just excerpting a small part of it; the full chart is here).

Bloodvalues

To quote Pat:

I was overwhelmed by the information -- so many chemicals and so many units of measure.  I quickly gave up and opened up the image in an picture editor.  One-by-one, I erased the blood chemicals I wasn't interested in.  Maybe if I was a doctor, the chart might have been useful.

 ***

One way to simplify this is using small multiples. Recognize that few if any users would need to directly compare every one of these chemicals. I'm guessing that groups of chemicals can go on separate charts. This is no different from a bookseller organizing shelves to help readers find books.

Also get rid of the minor gridlines.

For a summary chart of this kind, I doubt that it adds anything to include the information on whether the end of a range is definite and consistent, definite but inconsistent, or unknown.


The Tufte count

One of the things I picked up from Tufte is the horrible habit of counting the amount of data on a chart.  This is part of the info gathering to estimate the data-ink ratio (amount of data divided by the amount of ink used to depict them).

Leon B, a reader, left this in my inbox, months ago it turned out.  This is the British government's way of informing people how energy-efficient their homes are.  As Leon said:

these charts might be a great example of governments going overboard with colours, bars, letters and numbers and lines for something that really only has four data points.



Ukhomeenergy

In addition, I find the use of two different scales to be confusing and unnecessary.  If it is decided that scores in a particular range can be grouped as A, B, ..., G, then the original scale should be discarded.  52 is E and 70 is C.  (This is especially so since the score ranges are not intuitive, like 69-80 = C ?!)

Even worse, what's the point of citing the 0-100 scale without explaining what is the metric?

A table presentation does a far better job in a fraction of the space:

Redoukenergy_2










Source: Home Information Pack, UK Government.  Graph from Wikipedia.


 

PS. This post set off a torrent of emotions (see the comments).  Another version that I discarded was the simplest table possible.  In my view, there is still way too much distracting "junk" in the original design.  No one has yet explained why the 0-100 scale should be emphasized, or what it means!

Redo2ukenergy


For love of Color

Derek C. pointed us to this piece of chartjunk on Wikipedia.  This chart compares the mass of solar system objects, relative to the Earth's mass.Wiki_solar

Derek's comment:

The bars are inappropriate, as their length is proportional to the
logarithm of the ratio of the masses of the object and the Earth. Also
the multiple colours are distracting.

I'm also mystified by the first bar called "Solar System".  It seems to convey the idea that the Solar System is much larger than the Earth;  combined with the second bar ("Sun"), it tells us that every object but the Sun pales into insignificance.  If this is true, then the Solar System needs to be labelled differently as it is not a "solar system object".

Derek sent in a much improved chart:

Derekc_solar

His version is much cleaner.  The axis labels, properly oriented, are much easier to read.  The use of color is admirably restrained: I suspect that he is as baffled as I about the asterisks (now blue dots) in the original chart. I'd retain the vertical line through the Earth (relative mass = 1) to help anchor the chart.

But a job well done!  He should send it in to the powers to be at Wikipedia.