Flaming out
Why were they laughing?

A counterfeit data graphic

Just as there are counterfeit handbags that look like the real thing, there are fake data graphics that look like the real thing. Reader San C. shows us an example of this (found on the All Things D blog here):


At first sight, this appears to be a bubble chart. Further, the legend is telling us that the colors are meaningful. So, the bubbles correspond to different types of data, grouped by color, and the size of the bubbles represents the relative level of concern expressed by respondents.

That would be true if we were looking at the "real thing". But this is a counterfeit. How do we know it's fake?

Forrestor_creditcardFirst, the size of the bubbles was not sized to scale. Just look at the Social Security Number versus credit card number (shown on the right). A 1% difference shouldn't be visible on this sort of chart but the credit card bubble is clearly smaller.

Second, the legend gives the impression that the tint of the color carries information. However, it really doesn't. I lined up all of the green bubbles in order of decreasing data, and couldn't find any pattern to the tints.


There also isn't a clear pattern in the location of specific bubbles. Were they randomly scattered onto the chart?

In summary, this is the ultimate non-self-sufficient chart. If we remove the actual printed data from this chart, we're left with nothing.


This data can be put onto a grouped bar chart dot plot.



Aside from the graphical aspects, we should pay attention to some statistical issues.

The article does not stress enough the potential bias of this survey. The survey is an online survey of Internet users. Their average opinion about Internet-related issues should not be used to represent the opinion of the average American without careful consideration. There is a good possibility that people who have concerns about Internet privacy are less likely to be found on the Internet.

I also wonder if survey takers understand clearly the poll question. What does it mean by a company "accessing my personal information"? Does it mean I give the company the information (such as my credit card number) because I need to complete a transaction? Does it mean the company purchases such information from an information exchange? And if so, with or without informing me?

In particular, I don't understand the 28% who say they are not concerned about companies accessing their social security number.



Feed You can follow this conversation by subscribing to the comment feed for this post.

Naomi B. Robbins

Nice post. I'd call the redone plot a dot plot rather than a bar chart.


i always look at your redone chart first but to be honest, i don't get your chart this time.
i don't get the x-axis, what does it show? how many people who are highly concerned are also concerned?
furthermore i can't see how this chart is better than a table. because of the high white space on the x-axis (roughly an 45% range is unused; and that does not include the fact 0 is not to the very left of the axis), the differences between the categories are low/hard to make out. why bother at all then?

after reading the post, i understand what the x-axis shows. still hard to make out all the differences.


It's not counterfeit, because whoever made the chart probably doesn't realize what they did wrong. It's 'cargo cult' graphics; they don't really know how to deal with numbers, they just have a vague idea of what the end result kind of looks like.


Naomi: Yes, I switched to a dot plot and forgot to change the text. Thanks for pointing it out.

Thomas: There are a few details on my chart that can leave readers a little perplexed if you didn't read the original first. I didn't put the survey question as the title. Also, I made this plot in R which likes to express percentages as a decimal between 0 and 1.
As for "proportion concerned/highly concerned", the original designer is following a well-established tradition in survey research analysis, where we focus on the "top two boxes" of a five-point scale. One can certainly take issue with this but it's an industry standard.

Mitch: the source of the chart seems to be Forrester Research, which should have done better. They certainly are no novice.

Beth Renneisen

Thanks for this site, Len! I'll use the bubble chart in my class, and the messed up bar chart, too. Beth


Ultimate non-self-sufficient chart? It seems that many people compete for the Guinness World Record.

I played with my image analysis tool, keeping only the bubbles (even the head I know) and calculating the objects area (values are scaled based upon the 72% bubble). Results are, as you say, very far from what was written (Cf. http://stephane.vellay.free.fr/images/JunkChartBubblesArea.jpg )

I actually have another idea about the value they tried to follow for the bubble size: the number of letters in the bubble :)

PS: To continue with Naomi's comment, the post is not tagged as dot plot but as bar chart

The comments to this entry are closed.