Marketers want millennials to know they're millennials

When I posted about the lack of a standard definition of "millennials", Dean Eckles tweeted about the arbitrary division of age into generational categories. His view is further reinforced by the following chart, courtesy of PewResearch by way of


Pew asked people what generation they belong to. The amount of people who fail to place themselves in the right category is remarkable. One way to interpret this finding is that these are marketing categories created by the marketing profession. We learned in my other post that even people who use the term millennial do not have a consensus definition of it. Perhaps the 8 percent of "millennials" who identify as "boomers" are handing in a protest vote!

The chart is best read row by row - the use of stacked bar charts provides a clue. Forty percent of millennials identified as millennials, which leaves sixty percent identifying as some other generation (with about 5 percent indicating "other" responses). 

While this chart is not pretty, and may confuse some readers, it actually shows a healthy degree of analytical thinking. Arranging for the row-first interpretation is a good start. The designer also realizes the importance of the diagonal entries - what proportion of each generation self-identify as a member of that generation. Dotted borders are deployed to draw eyes to the diagonal.


The design doesn't do full justice for the analytical intelligence. Despite the use of the bar chart form, readers may be tempted to read column by column due to the color scheme. The chart doesn't have an easy column-by-column interpretation.

It's not obvious which axis has the true category and which, the self-identified category. The designer adds a hint in the sub-title to counteract this problem.

Finally, the dotted borders are no match for the differential colors. So a key message of the chart is buried.

Here is a revised chart, using a grouped bar chart format:



In a Trifecta checkup (link), the original chart is a Type V chart. It addresses a popular, pertinent question, and it shows mature analytical thinking but the visual design does not do full justice to the data story.



Who is a millennial? An example of handling uncertainty

I found this fascinating chart from CNBC, which attempts to nail down the definition of a millennial.


It turns out everyone defines "millennials" differently. They found 23 different definitions. Some media outlets apply different definitions in different items.

I appreciate this effort a lot. The design is thoughtful. In making this chart, the designer added the following guides:

  • The text draws attention to the definition with the shortest range of birth years, and the one with the largest range.
  • The dashed gray gridlines help with reading the endpoints of each bar.
  • The yellow band illustrates the so-called average range. It appears that this average range is formed by taking the average of the beginning years and the average of the ending years. This indicates a desire to allow comparisons between each definition and the average range.
  • The bars are ordered by the ending birth year (right edge).

The underlying issue is how to display uncertainty. The interest here is not just to feature the "average" definition of a millennial but to show the range of definitions.


In making my chart, I apply a different way to find the "average" range. Given any year, say 1990, what is the chance that it is included in any of the definitions? In other words, what proportion of the definitions include that year? In the following chart, the darker the color, the more likely that year is included by the "average" opinion.


I ordered the bars from shortest to the longest so there is no need to annotate them. Based on this analysis, 90 percent (or higher) of the sources list 19651985 to 1993 as part of the range while 70 percent (or higher) list 19611981 to 1996 as part of the range.



Does this chart tell the sordid tale of TI's decline?

The Hustle has an interesting article on the demise of the TI calculator, which is popular in business circles. The article uses this bar chart:


From a Trifecta Checkup perspective, this is a Type DV chart. (See this guide to the Trifecta Checkup.)

The chart addresses a nice question: is the TI graphing calculator a victim of new technologies?

The visual design is marred by the use of the calculator images. The images add nothing to our understanding and create potential for confusion. Here is a version without the images for comparison.


The gridlines are placed to reveal the steepness of the decline. The sales in 2019 will likely be half those of 2014.

What about the Data? This would have been straightforward if the revenues shown are sales of the TI calculator. But according to the subtitle, the data include a whole lot more than calculators - it's the "other revenues" category in the financial reports of Texas Instrument which markets the TI. 

It requires a leap of faith to believe this data. It is entirely possible that TI calculator sales increased while total "other revenues" decreased! The decline of TI calculator could be more drastic than shown here. We simply don't have enough data to say for sure.


P.S. [10/3/2019] Fixed TI.



Tennis greats at the top of their game

The following chart of world No. 1 tennis players looks pretty but the payoff of spending time to understand it isn't high enough. The light colors against the tennis net backdrop don't work as intended. The annotation is well done, and it's always neat to tug a legend inside the text.


The original is found at Tableau Public (link).

The topic of the analysis appears to be the ages at which tennis players attained world #1 ranking. Here are the male players visualized differently:


Some players like Jimmy Connors and Federer have second springs after dominating the game in their late twenties. It's relatively rare for players to get to #1 after 30.

Women workers taken for a loop or four

I was drawn to the following chart in Business Insider because of the calendar metaphor. (The accompanying article is here.)


Sometimes, the calendar helps readers grasp concepts faster but I'm afraid the usage here slows us down.

The underlying data consist of just four numbers: the wage gaps between race and gender in the U.S., considered simply from an aggregate median personal income perspective. The analyst adopts the median annual salary of a white male worker as a baseline. Then, s/he imputes the number of extra days that others must work to attain the same level of income. For example, the median Asian female worker must work 64 extra days (at her daily salary level) to match the white guy's annual pay. Meanwhile, Hispanic female workers must work 324 days extra.

There are a host of reasons why the calendar metaphor backfired.

Firstly, it draws attention to an uncomfortable detail of the analysis - which papers over the fact that weekends or public holidays are counted as workdays. The coloring of the boxes compounds this issue. (And the designer also got confused and slipped up when applying the purple color for Hispanic women.)

Secondly, the calendar focuses on Year 2 while Year 1 lurks in the background - white men have to work to get that income (roughly $46,000 in 2017 according to the Census Bureau).

Thirdly, the calendar view exposes another sore point around the underlying analysis. In reality, the white male workers are continuing to earn wages during Year 2.

The realism of the calendar clashes with the hypothetical nature of the analysis.


One can just use a bar chart, comparing the number of extra days needed. The calendar design can be considered a set of overlapping bars, wrapped around the shape of a calendar.

The staid bars do not bring to life the extra toil - the message is that these women have to work harder to get the same amount of pay. This led me to a different metaphor - the white men got to the destination in a straight line but the women must go around loops (extra days) before reaching the same endpoint.


While the above is a rough sketch, I made sure that the total length of the lines including the loops roughly matches the total number of days the women needed to work to earn $46,000.


The above discussion focuses solely on the V(isual) corner of the Trifecta Checkup, but this data visualization is also interesting from the D(ata) perspective. Statisticians won't like such a simple analysis that ignores, among other things, the different mix of jobs and industries underlying these aggregate pay figures.

Now go to my other post on the sister (book) blog for a discussion of the underlying analysis.



It's hot even in Alaska

A twitter user pointed to the following chart, which shows that Alaska has experienced extreme heat this summer, with the July statewide average temperature shattering the previous record;


This column chart is clear in its primary message: the red column shows that the average temperature this year is quite a bit higher than the next highest temperature, recorded in July 2004. The error bar is useful for statistically-literate people - the uncertainty is (presumably) due to measurement errors. (If a similar error bar is drawn for the July 2004 column, these bars probably overlap a bit.)

The chart violates one of the rules of making column charts - the vertical axis is truncated at 53F, thus the heights or areas of the columns shouldn't be compared. This violation was recently nominated by two dataviz bloggers when asked about "bad charts" (see here).

Now look at the horizontal axis. These are the years of the top 20 temperature records, ordered from highest to lowest. The months are almost always July except for the year 2004 when all three summer months entered the top 20. I find it hard to make sense of these dates when they are jumping around.

In the following version, I plotted the 20 temperatures on a chronological axis. Color is used to divide the 20 data points into four groups. The chart is meant to be read top to bottom. 



What is a bad chart?

In the recent issue of Madolyn Smith’s Conversations with Data newsletter hosted by, she discusses “bad charts,” featuring submissions from several dataviz bloggers, including myself.

What is a “bad chart”? Based on this collection of curated "bad charts", it is not easy to nail down “bad-ness”. The common theme is the mismatch between the message intended by the designer and the message received by the reader, a classic error of communication. How such mismatch arises depends on the specific example. I am able to divide the “bad charts” into two groups: charts that are misinterpreted, and charts that are misleading.


Charts that are misinterpreted

The Causes of Death entry, submitted by Alberto Cairo, is a “well-designed” chart that requires “reading the story where it is inserted and the numerous caveats.” So readers may misinterpret the chart if they do not also partake the story at Our World in Data which runs over 1,500 words not including the appendix.


The map of Canada, submitted by Highsoft, highlights in green the provinces where the majority of residents are members of the First Nations. The “bad” is that readers may incorrectly “infer that a sizable part of the Canadian population is First Nations.”


In these two examples, the graphic is considered adequate and yet the reader fails to glean the message intended by the designer.


Charts that are misleading

Two fellow bloggers, Cole Knaflic and Jon Schwabish, offer the advice to start bars at zero (here's my take on this rule). The “bad” is the distortion introduced when encoding the data into the visual elements.

The Color-blindness pictogram, submitted by Severino Ribecca, commits a similar faux pas. To compare the rates among men and women, the pictograms should use the same baseline.


In these examples, readers who correctly read the charts nonetheless leave with the wrong message. (We assume the designer does not intend to distort the data.) The readers misinterpret the data without misinterpreting the graphics.


Using the Trifecta Checkup

In the Trifecta Checkup framework, these problems are second-level problems, represented by the green arrows linking up the three corners. (Click here to learn more about using the Trifecta Checkup.)


The visual design of the Causes of Death chart is not under question, and the intended message of the author is clearly articulated in the text. Our concern is that the reader must go outside the graphic to learn the full message. This suggests a problem related to the syncing between the visual design and the message (the QV edge).

By contrast, in the Color Blindness graphic, the data are not under question, nor is the use of pictograms. Our concern is how the data got turned into figurines. This suggests a problem related to the syncing between the data and the visual (the DV edge).


When you complain about a misleading chart, or a chart being misinterpreted, what do you really mean? Is it a visual design problem? a data problem? Or is it a syncing problem between two components?

SCMP's fantastic infographic on Hong Kong protests

In the past month, there have been several large-scale protests in Hong Kong. The largest one featured up to two million residents taking to the streets on June 16 to oppose an extradition act that was working its way through the legislature. If the count was accurate, about 25 percent of the city’s population joined in the protest. Another large demonstration occurred on July 1, the anniversary of Hong Kong’s return to Chinese rule.

South China Morning Post, which can be considered the New York Times of Hong Kong, is well known for its award-winning infographics, and they rose to the occasion with this effort.

This is one of the rare infographics that you’d not regret spending time reading. After reading it, you have learned a few new things about protesting in Hong Kong.

In particular, you’ll learn that the recent demonstrations are part of a larger pattern in which Hong Kong residents express their dissatisfaction with the city’s governing class, frequently accused of acting as puppets of the Chinese state. Under the “one country, two systems” arrangement, the city’s officials occupy an unenviable position of mediating the various contradictions of the two systems.

This bar chart shows the growth in the protest movement. The recent massive protests didn't come out of nowhere. 


This line chart offers a possible explanation for burgeoning protests. Residents’ perceived their freedoms eroding in the last decade.


If you have seen videos of the protests, you’ll have noticed the peculiar protest costumes. Umbrellas are used to block pepper sprays, for example. The following lovely graphic shows how the costumes have evolved:


The scale of these protests captures the imagination. The last part in the infographic places the number of protestors in context, by expressing it in terms of football pitches (as soccer fields are known outside the U.S.) This is a sort of universal measure due to the popularity of football almost everywhere. (Nevertheless, according to Wikipedia, the fields do not have one fixed dimension even though fields used for international matches are standardized to 105 m by 68 m.)


This chart could be presented as a bar chart. It’s just that the data have been re-scaled – from counting individuals to counting football pitches-ful of individuals. 

Here is the entire infographics.

Three estimates, two differences trip up an otherwise good design

Reader Fernando P. was baffled by this chart from the Perception Gap report by More in Common. (link to report)


Overall, this chart is quite good. Its flaws are subtle. There is so much going on, perhaps even the designer found it hard to keep level.

The title is "Democrat's Perception Gap" which actually means the gap between Democrats' perception of Republicans and Republican's self-reported views. We are talking about two estimates of Republican views. Conversely, in Figure 2 (not shown), the "Republican's Perception Gap" describes two estimates of Democrat views.

The gap is visually shown as the gray bar between the red dot and the blue dot. This is labeled perception gap, and its values are printed on the right column, also labeled perception gap.

Perhaps as an after-thought, the designer added the yellow stripes, which is a third estimate of Republican views, this time by Independents. This little addition wreaks havoc. There are now three estimates - and two gaps. There is a new gap, between Independents' perception of Republican views, and Republican's self-reported views. This I-gap is hidden in plain sight. The words "perception gap" obstinately sticks to the D-gap.


Here is a slightly modified version of the same chart.



The design focuses attention on the two gaps (bars). It also identifies the Republican self-perception as the anchor point from which the gaps are computed.

I have chosen to describe the Republican dot as "self-perception" rather than "actual view," which connotes a form of "truth." Rather than considering the gap as an error of estimation, I like to think of the gap as the difference between two groups of people asked to estimate a common quantity.

Also, one should note that on the last two issues, there is virtual agreement.


Aside from the visual, I have doubts about the value of such a study. Only the most divisive issues are being addressed here. Adding a few bipartisan issues would provide controls that can be useful to tease out what is the baseline perception gap.

I wonder whether there is a self-selection in survey response, such that people with extreme views (from each party) will be under-represented. Further, do we believe that all survey respondents will provide truthful answers to sensitive questions that deal with racism, sexism, etc.? For example, if I am a moderate holding racist views, would I really admit to racism in a survey?



Putting the house in order, two Brexit polls

Reader Steve M. noticed an oversight in the Guardian in the following bar chart (link):


The reporter was discussing an important story that speaks to the need for careful polling design. He was comparing two polls, one by Ipsos Mori, and one by YouGov, that estimates the vote support for each party in the future U.K. general election. The bottom line is that the YouGov poll predicts about double the support for the Brexit Party than the Ipsos-Mori poll.

The stacked bar chart should only be used for data that can be added up. Here, we should be comparing the numbers side by side:


I've always found this standard display inadequate. The story here is the gap in the two bar lengths for the Brexit Party. A secondary story is that the support for the Brexit Party might come from voters breaking from Labour. In other words, we really want the reader to see:


Switching to a dot plot helps bring attention to the gaps:


Now, putting the house in order:


Why do these two polls show such different results? As the reporter explained, the answer is in how the question was asked. The Ipsos-Mori is unprompted, meaning the Brexit Party was not announced to the respondent as one of the choices while the YouGov is prompted.

This last version imposes a direction on the gaps to bring out the secondary message - that the support for Brexit might be coming from voters breaking from Labour.