Light entertainment: Acid Images

A contact commented on the following chart circulating on Linkedin to promote Portugal:

Linkedin_portugal_processedfood

His main complaint: the flag of Portugal is wrong!

Imagine.

***

A couple of things to note about this image.

I clicked on the "CR" logo on the top left corner, and learned about something called Content Credentials. It tells me that the image was generated by ChatGPT.

Linkedin_portugal_processedfood_contentcredentials

I applaud this effort. Will it stop fraud? Probably not but at least it gives honest people a way to label the work.

***

The second thing is, there are many errors throughout this map. Let's make a list...

I'll get us started.

There are two French flags: one is linked to the second highest value while the other one is linked to the second lowest value.

 


Students demonstrate how analytics underlie strong dataviz

In today's post, I'm delighted to feature work by several students of Ray Vella's data visualization class at NYU. They have been asked to improve the following Economist chart entitled "The Rich Get Richer".

Economist_richgetricher

In my guest lecture to the class, I emphasized the importance of upfront analytics when constructing data visualizations.

One of the key messages is pay attention to definitions. How does the Economist define "rich" and "poor"? (it's not what you think). Instead of using percentiles (e.g. top 1% of the income distribution), they define "rich" as people living in the richest region by average GDP, and "poor" as people living in the poorest region by average GDP. Thus, the "gap" between the rich and the poor is measured by the difference in GDP between the average persons in those two regions.

I don't like this metric at all but we'll just have to accept that that's the data available for the class assignment.

***

Shulin Huang's work is notable in how she clarifies the underlying algebra.

Shulin_rvella_economist_richpoorgap

The middle section classifies the countries into two groups, those with widening vs narrowing gaps. The side panels show the two components of the gap change. The gap change is the sum of the change in the richest region and the change in the poorest region.

If we take the U.S. as an example, the gap increased by 1976 units. This is because the richest region gained 1777 while the poor region lost 199. Germany has a very different experience: the richest region regressed by 2215 while the poorest region improved by 424, leading to the gap narrowing by 2638.

Note how important it is to keep the order of the countries fixed across all three panels. I'm not sure how she decided the order of these countries, which is a small oversight in an otherwise excellent effort.

Shulin's text is very thoughtful throughout. The chart title clearly states "rich regions" rather than "the rich". Take a look at the bottom of the side panels. The label "national AVG" shows that the zero level is the national average. Then, the label "regions pulled further ahead" perfectly captures the positive direction.

Compared to the original, this chart is much more easily understood. The secret is the clarity of thought, the deep understanding of the nature of the data.

***

Michael Unger focuses his work on elucidating the indexing strategy employed by the Economist. In the original, each value of regional average GDP is indexed to the national average of the relevant year. A number like 150 means the region has an average GDP for the given year that is 50% higher than the national average. It's tough to explain how such indices work.

Michael's revision goes back to the raw data. He presents them in two panels. On the left, the absolute change over time in the average GDPs are presented for each of the richest/poorest region while on the right, the relative change is shown.

Mungar_rvella_economist_richpoorgap

(Some of the country labels are incorrect. I'll replace with a corrected version when I receive one.)

Presenting both sides is not redundant. In France, for example, the richest region improved by 17K while the poorest region went up by not quite 6K. But 6K on a much lower base represents a much higher proportional jump as the right side shows.

***

Related to Michael's work, but even simpler, is Debbie Hsieh's effort.

Debbiehsieh_rayvella_economist_richpoorgap

Debbie reduces the entire exercise to one message - the relative change over time in average GDP between the richest and poorest region in each country. In this simplest presentation, if both columns point up, then both the richest and the poorest region increased their average GDP; if both point down, then both regions suffered GDP drops.

If the GDP increased in the richest region while it decreased in the poorest region, then the gap widened by the most. This is represented by the blue column pointing up and the red column pointing down.

In some countries (e.g. Sweden), the poorest region (orange) got worse while the richest region (blue) improved slightly. In Italy and Spain, both the best and worst regions gained in average GDPs although the richest region attained a greater relative gain.

While Debbie's chart is simpler, it hides something that Michael's work shows more clearly. If both the richest and poorest regions increased GDP by the same percentage amount, the average person in the richest region actually experienced a higher absolute increase because the base of the percentage is higher.

***

The numbers across these charts aren't necessarily well aligned. That's actually one of the challenges of this dataset. There are many ways to process the data, and small differences in how each student handles the data lead to differences in the derived values, resulting in differences in the visual effects.


Decluttering charts

Enrico posted about the following chart, addressing the current assault on scientific research funding, and he's worried that poor communications skills are hurting the cause.

Bertini_tiretracks

He's right. You need half an hour to figure out what's going on here.

Let me write down what I have learned so far.

The designer only cares about eight research areas - all within the IT field - listed across the bottom.

Paired with each named research area are those bolded blue labels that run across the top (but not quite). I think they represent the crowning achievement within each field but I'm just guessing here.

It appears that each field experiences a sequence of development stages. Typically, universities get things going, then industry R&D teams enter the game, and eventually, products appear in the market. The orange, blue and black lines show this progression. The black line morphs into green, and may even expand in thickness - indicating progressive market adoption and growth.

For example, the first field from the left, digital communications, is shown to have begun in 1965 at universities. Then in early 1980s, industry started investing in this area. It was not until the 1990s when products became available, and not until the mid 2000s when the market exceeded $10 billion.

Even now, I haven't resolved all its mysteries. It's not explained the difference between a solid black line and a dotted black line. Further, it appears possible to bypass $1 billion and hit $10 billion right away.

***

Next, we must decipher the strange web of gray little arrows.

It appears that the arrows can go from orange to blue, blue to orange, blue to black, orange to black. Under digital communications, I don't see black or green back to blue or orange. However, under computer architecture, I see green to orange; under parallel & distributed systems, I see green to blue. I don't see any black to orange or black to blue, so black is a kind of trapping state (things go in but don't come out). Sometimes, it's better to say which direction is not possible - in this case, I think other than nothing comes out of black, every other direction is possible.

It remains unclear what sort of entity each arrow depicts. Each arrow has a specific start and end time. I'm guessing it has to do with a specific research item. Taking the bottom-most arrow for digital communications, I suppose something begun in academia in 1980 and then attracted industry investment around 1982. An arrow that points backwards from industry to academia indicates that universities pick up new research ideas from industry. Digital communications things tend to have short arrows, suggesting that it takes only a few years to bring a product to market.

To add to this mess, some arrows cross research areas. These are shown as curved arrows, rather than straight arrows. For these curved arrows, the "slope" of the arrow no longer holds any meaning.

The set of gray arrows are trying too hard. They are overstuffed with purposes. On the one hand, the web of arrows - and I'm referring to those between research areas - portray the synergies between different research areas. On the other hand, the arrows within each research area show the development trajectories of anonymized subjects. The arrows going back and forth between the orange and blue bars show the interplay between universities and industry research groups.

***

Lastly, we look at those gray text labels at the very top of the page. That's a grab-bag of corporate names (Motorola, Intel, ...) and product names (iPhone, iRobot, ...). Some companies span several research areas. I'm amused and impressed that apparently a linear sequence can be found for the eight research areas such that every single company has investments in only contiguous areas, precluding the need to "leapfrog" certain research areas!

Actually, no, that's wrong. I do notice Nvidia and HP appearing twice. But why is Google not part of digital communications next to iPhone?

Given that no universities are listed, the company and product labels are related to only the blue, black or green lines below. It might be only related to black and/or green. I'm not sure.

***

So far, I've expended energy only to tease out the structure of the underlying dataset. I haven't actually learned anything about the data!

***

The designer has to make some decisions because the different potential questions that the dataset can address impose conflicting graphical requirements.

If the goal is to surface a general development process that repeats for every research area, then the chart should highlight commonality, rather than difference. By contrast, if one's objective is to illustrate how certain research areas have experiences unique to themselves, one should choose a graphical form that brings out the differences.

If the focus is on larger research areas, then the relevant key dates are really the front ends of each vertical line; nothing else matters. By contrast, if one wants to show individual research items, then many more dates become pertinent.

A linear arrangement of the research areas will not perform if one's goal is to uncover connections between research areas. By contrast, if one attempts to minimize crossovers in a network design, it would be impossible to keep all elements belonging to each research area in close proximity.

A layering approach that involves multiple charts to tell the whole story may be the solution. See for example Gelman's post on ladder of abstraction.


Out of line

This simple chart showing life expectancies in 10 countries raises one's eyebrows.

Lifeexpectancy_indiatv

The first curiosity is the deliberate placement of Pakistan behind India and China. Every nation is sorted from lowest to highest, except for Pakistan. Is the reason politics? I have no idea. If you have an explanation, please leave a comment.

***
This graphic is an example of data visualization that does not actually show the data.

The positions of the flags do not in fact encode the data! For example, the Indian flag is closer to the Chinese flag than to the Pakistani flag even though the gap between India and China (7) is more than double the gap between India and Pakistan (3).

Here is what it looks like if the gaps encode the data. With this selection of countries, Pakistan and India are separated from the rest. 

Junkcharts_redo_indiatvlifeexpectancy

In the original chart, the readers must read the data labels to understand it, and resist intepreting the visual elements.

I removed the flag poles because they have the unintended consequence of establishing a zero level (where the cartoon characters stand) but the positions of the flags don't reflect a start-at-zero posture.

***

Returning to our first topic for a second. If the message of the chart is to single out Pakistan, it actually works! If all other countries are sorted by value, with Pakistan inserted out of order, it draws our attention.

In a conventional layout, Pakistan is shoved to the left side in the bottom corner. See below:

Junkcharts_redo_indiatvlifeexpectancy_2