Adjust, and adjust some more

This Financial Times report illustrates the reason why we should adjust data.

The story explores the trend in economic statistics during 14 years of governing by conservatives. One of those metrics is so-called council funding (local governments). The graphic is interactive: as the reader scrolls the page, the chart transforms.

The first chart shows the "raw" data.

Ft_councilfunding1

The vertical axis shows year-on-year change in funding. It is an index relative to the level in 2010. From this line chart, one concludes that council funding decreased from 2010 to around 2016, then grew; by 2020, funding has recovered to the level of 2010 and then funding expanded rapidly in recent years.

When the reader scrolls down, this chart is replaced by another one:

Ft_councilfunding2

This chart contains a completely different picture. The line dropped from 2010 to 2016 as before. Then, it went flat, and after 2021, it started raising, even though by 2024, the value is still 10 percent below the level in 2010.

What happened? The data journalist has taken the data from the first chart, and adjusted the values for inflation. Inflation was rampant in recent years, thus, some of the raw growth have been dampened. In economics, adjusting for inflation is also called expressing in "real terms". The adjustment is necessary because the same dollar (hmm, pound) is worth less when there is inflation. Therefore, even though on paper, council funding in 2024 is more than 25 percent higher than in 2010, inflation has gobbled up all of that and more, to the point in which, in real terms, council funding has fallen by 20 percent.

This is one material adjustment!

Wait, they have a third chart:

Ft_councilfunding3

It's unfortunate they didn't stabilize the vertical scale. Relative to the middle chart, the lowest point in this third chart is about 5 percent lower, while the value in 2024 is about 10 percent lower.

This means, they performed a second adjustment - for population change. It is a simple adjustment of dividing by the population. The numbers look worse probably because population has grown during these years. Thus, even if the amount of funding stayed the same, the money would have to be split amongst more people. The per-capita adjustment makes this point clear.

***

The final story is much different from the initial one. Not only was the magnitude of change different but the direction of change reversed.

Whenever it comes to adjustments, remember that all adjustments are subjective. In fact, choosing not to adjust is also subjective. Not adjusting is usually much worse.

 

 

 

 


Prime visual story-telling

A story from the New York Times about New York City neighborhoods has been making the rounds on my Linkedin feed. The Linkedin post sends me to this interactive data visualization page (link).

Here, you will find a multi-colored map.

Nyt_newyorkneighborhoodsmap

The colors show the extant of named neighborhoods in the city. If you look closely, the boundaries between neighborhoods are blurred since it's often not clear where one neighborhood ends and where another one begins. I was expecting this effect when I recognize the names of the authors, who have previously published other maps that obsess over spatial uncertainty.

I clicked on an area for which I know there may be differing opinions:

Nyt_newyorkneighborhoods_example

There was less controversy than I expected.

***

What was the dataset behind this dataviz project? How did they get such detailed data on every block of the city? Wouldn't they have to interview a lot of residents to compile the data?

I'm quite impressed with what they did. They put up a very simple survey (emphasis on: very simple). This survey is only possible with modern browser technology. It asks the respondent to pinpoint the location of where they live, and name their neighborhood. Then it asks the respondent to draw a polygon around their residence to include the extant of the named neighborhood. This consists of a few simple mouse clicks on the map that shows the road network. Finally, the survey collects optional information on alternative names for the neighborhood, etc.

When they process the data, they assign the respondent's neighborhood name to all blocks encircled by the polygon. This creates a lot of data in a few brush strokes, so to speak. This is a small (worthwhile) tradeoff even though the respondent didn't really give an answer for every block.

***

Bear with me, I'm getting to the gist of this blog post. The major achievement isn't the page that was linked to above. The best thing the dataviz team did here is the visual story that walks the reader through insights drawn from the dataviz. You can find the visual story here.

What are the components of a hugely impressive visual story?

  • It combines data visualization with old-fashioned archival research. The historical tidbits add a lot of depth to the story.
  • It combines data visualization with old-fashioned reporting. The quotations add context to how people think about neighborhoods - something that cannot be obtained from the arms-length process of conducting an online survey.
  • It highlights curated insights from the underlying data - even walking the reader step by step through the relevant sections of the dataviz that illustrate these insights.

At the end of this story, some fraction of users may be tempted to go back to the interactive dataviz to search for other insights, or obtain answers to their personalized questions. They are much better prepared to do so, having just seen how to use the interactive tool!

***

The part of the visual story I like best is toward the end. Instead of plotting all the data on the map, they practice some restraint, and filter the data. They show the boundaries that have reached at least a certain level of consensus among the respondents.

The following screenshot shows those areas for which at least 90% agree.

Nyt_newyorkneighborhoods_90pc

Pardon the white text box, I wasn't able to remove it.

***

One last thing...

Every time an analyst touches data, or does something with data, s/he imposes assumptions, and sometimes, these assumptions are so subtle that even the analyst may not have noticed. Frequently, these assumptions are baked into the analytical "models," which is why they may fall through the cracks.

One such assumption in making this map is that every block in the city belongs to at least one named neighborhood. An alternative assumption is that neighborhoods are named only because certain blocks have things in common, and because these naming events occur spontaneously, it's perfectly ok to have blocks that aren't part of any named neighborhood.

 

 


Parsons Student Projects

I had the pleasure of attending the final presentations of this year's graduates from Parsons's MS in Data Visualization program. You can see the projects here.

***

A few of the projects caught my eye.

A project called "Authentic Food in NYC" explores where to find "authentic" cuisine in New York restaurants. The project is notable for plowing through millions of Yelp reviews, and organizing the information within. Reviews mentioning "authentic" or "original" were extracted.

During the live presentation, the student clicked on Authentic Chinese, and the name that popped up was Nom Wah Tea Parlor, which serves dim sum in Chinatown that often has lines out the door.

Shuyaoxiao_authenticfood_parsons

Curiously, the ranking is created from raw counts of authentic reviews, which favors restaurants with more reviews, such as restaurants that have been operating for a longer time. It's unclear what rule is used to transfer authenticity from reviews to restaurants: does a single review mentioning "authentic" qualify a restaurant as "authentic", or some proportion of reviews?

Later, we see a visualization of the key words found inside "authentic" reviews for each cuisine. Below are words for Chinese and Italian cuisines:

Shuyaoxiao_authenticcuisines_parsons_words

These are word clouds with a twist. Instead of encoding the word counts in the font sizes, she places each word inside a bubble, and uses bubble sizes to indicate relative frequency.

Curiously, almost all the words displayed come from menu items. There isn't any subjective words to be found. Algorithms that extract keywords frequently fail in the sense that they surface the most obvious, uninteresting facts. Take the word cloud for Taiwanese restaurants as an example:

Shuyaoxiao_authenticcuisines_parsons_taiwan

The overwhelming keyword found among reviews of Taiwanese restaurants is... "taiwanese". The next most important word is "taiwan". Among the remaining words, "886" is the name of a specific restaurant, "bento" is usually associated with Japanese cuisine, and everything else is a menu item.

Getting this right is time-consuming, and understandably not a requirement for a typical data visualization course.

The most interesting insight is found in this data table.

Shuyaoxiao_authenticcuisines_ratios

It appears that few reviewers care about authenticity when they go to French, Italian, and Japanese restaurants but the people who dine at various Asian restaurants, German restaurants, and Eastern European restaurants want "authentic" food. The student concludes: "since most Yelp reviewers are Americans, their pursuit of authenticity creates its own trap: Food authenticity becomes an americanized view of what non-American food is."

This hits home hard because I know what authentic dim sum is, and Nom Wah Tea Parlor it ain't. Let me check out what Yelpers are saying about Nom Wah:

  1. Everything was so authentic and delicious - and cheap!!!
  2. Your best bet is to go around the corner and find something more authentic.
  3. Their dumplings are amazing everything is very authentic and tasty!
  4. The food was delicious and so authentic, and the staff were helpful and efficient.
  5. Overall, this place has good authentic dim sum but it could be better.
  6. Not an authentic experience at all.
  7. this dim sum establishment is totally authentic
  8. The onions, bean sprouts and scallion did taste very authentic and appreciated that.
  9. I would skip this and try another spot less hyped and more authentic.
  10. I would have to take my parents here the next time I visit NYC because this is authentic dim sum.

These are the most recent ten reviews containing the word "authentic". Seven out of ten really do mean authentic, the other three are false friends. Text mining is tough business! The student removed "not authentic" which helps. As seen from above, "more authentic" may be negative, and there may be words between "not" and "authentic". Also, think "not inauthentic", "people say it's authentic, and it's not", etc.

One thing I learned from this project is that "authentic" may be a synonym for "I like it" when these diners enjoy the food at an ethnic restaurant. I'm most curious about what inauthentic onions, bean sprouts and scallion taste like.

I love the concept and execution of this project. Nice job!

***

Another project I like is about tourism in Venezuela. The back story is significant. Since a dictatorship took over the country, the government stopped reporting tourism statistics. It's known that tourism collapsed, and that it may be gradually coming back in recent years.

This student does not have access to ready-made datasets. But she imaginatively found data to pursue this story. Specifically, she mentioned grabbing flight schedules into the country from the outside.

The flow chart is a great way to explore this data:

Ibonnet_parsons_dataviz_flightcities

A map gives a different perspective:

Ibonnet_parsons_dataviz_flightmap

I'm glad to hear the student recite some of the limitations of the data. It's easy to look at these visuals and assume that the data are entirely reliable. They aren't. We don't know that what proportion of the people traveling on those flights are tourists, how full those planes are, or the nationalities of those on board. The fact that a flight originated from Panama does not mean that everyone on board is Panamanian.

***

The third project is interesting in its uniqueness. This student wants to highlight the effect of lead in paint on children's health. She used the weight of lead marbles to symbolize the impact of lead paint. She made a dress with two big pockets to hold these marbles.

Scherer_parsons_dataviz_leaddress sm

It's not your standard visualization. One can quibble that dividing the marbles into two pockets doesn't serve a visualziation purpose, and so on. But at the end, it's a memorable performance.


All about Connecticut

This dataviz project by CT Mirror is excellent. The project walks through key statistics of the state of Connecticut.

Here are a few charts I enjoyed.

The first one shows the industries employing the most CT residents. The left and right arrows are perfect, much better than the usual dot plots.

Ctmirror_growingindustries

The industries are sorted by decreasing size from top to bottom, based on employment in 2019. The chosen scale is absolute, showing the number of employees. The relative change is shown next to the arrow heads in percentages.

The inclusion of both absolute and relative scales may be a source of confusion as the lengths of the arrows encode the absolute differences, not the relative differences indicated by the data labels. This type of decision is always difficult for the designer. Selecting one of the two scales may improve clarity but induce loss aversion.

***

The next example is a bumps chart showing the growth in residents with at least a bachelor's degree.

Ctmirror_highered

This is more like a slopegraph as it appears to draw straight lines between two time points 9 years apart, omitting the intervening years. Each line represents a state. Connecticut's line is shown in red. The message is clear. Connecticut is among the most highly educated out of the 50 states. It maintained this advantage throughout the period.

I'd prefer to use solid lines for the background states, and the axis labels can be sparser.

It's a little odd that pretty much every line has the same slope. I'm suspecting that the numbers came out of a regression model, with varying slopes by state, but the inter-state variance is low.

In the online presentation, one can click on each line to see the values.

***

The final example is a two-sided bar chart:

Ctmirror_migration

This shows migration in and out of the state. The red bars represent the number of people who moved out, while the green bars represent those who moved into the state. The states are arranged from the most number of in-migrants to the least.

I have clipped the bottom of the chart as it extends to 50 states, and the bottom half is barely visible since the absolute numbers are so small.

I'd suggest showing the top 10 states. Then group the rest of the states by region, and plot them as regions. This change makes the chart more compact, as well as more useful.

***

There are many other charts, and I encourage you to visit and support this data journalism.

 

 

 


Illustrating coronavirus waves with moving images

The New York Times put out a master class in visualizing space and time data recently, in a visualization of five waves of Covid-19 that have torched the U.S. thus far (link).

Nyt_coronawaves_title

The project displays one dataset using three designs, which provides an opportunity to compare and contrast them.

***

The first design - above the headline - is an animated choropleth map. This is a straightforward presentation of space and time data. The level of cases in each county is indicated by color, dividing the country into 12 levels (plus unknown). Time is run forward. The time legend plays double duty as a line chart that shows the change in the weekly rate of reported cases over the course of the pandemic. A small piece of interactivity binds the legend with the map.

Nyt_coronawaves_moviefront

(To see a screen recording of the animation, click on the image above.)

***

The second design comprises six panels, snapshots that capture crucial "turning points" during the Covid-19 pandemic. The color of each county now encodes an average case rate (I hope they didn't just average the daily rates). 

Nyt_coronawaves_panelsix

The line-chart legend is gone -  it's not hard to see Winter > Fall 2020 > Summer/Fall 2021 >... so I don't think it's a big loss.

The small-multiples setup is particularly effective at facilitating comparisons: across time, and across space. It presents a story in pictures.

They may have left off 2020 following "Winter" because December to February spans both years but "Winter 2020" may do more benefit than harm here.

***

The third design is a series of short films, which stands mid-way between the single animated map and the six snapshots. Each movie covers a separate window of time.

This design does a better job telling the story within each time window while it obstructs comparisons across time windows.

Nyt_coronawaves_shortfilms

The informative legend is back. This time, it's showing the static time window for each map.

***

The three designs come from the same dataset. I think of them as one long movie, six snapshots, and five short films.

The one long movie is a like a data dump. It shows every number in the dataset, which is the weekly case rate for each county for a given week. All the data are streamed into a single map. It's a show piece.

As an instrument to help readers understand the patterns in the dataset, the movie falls short. Too much is going on, making it hard to focus and pick out key trends. When your eyes are everywhere, they are nowhere.

The six snapshots represent the other extreme. The graph does not move, as the time axis is reduced to six discrete time points. But this display describes the change points, and tells a story. The long movie, by contrast, invites readers to find a story.

Without motion, the small-multiples format allows us to pick out specific counties or regions and compare the case rates across time. This task is close to impossible in the long movie, as it requires freezing the movie, and jumping back and forth.

The five short films may be the best of both worlds. It retains the motion. If the time windows are chosen wisely, each short film contains a few simple patterns that can easily be discerned. For example, the third film shows how the winter wave emerged from the midwest and then walloped the whole country, spreading southward and toward the coasts.

Nyt_winterwave

(If the above gif doesn't play, click it.)

***

If there is double or triple the time allocated to this project, I'd want to explore spatial clustering. I'd like to dampen the spatial noise (neighboring counties that have slightly different experiences). There is also temporal noise (fluctuations from week to week for the same county) - which can be smoothed away. I think with these statistical techniques, the "wave" feature of the pandemic may be more visible.

 

 


Ridings, polls, elections, O Canada

Stephen Taylor reached out to me about his work to visualize Canadian elections data. I took a look. I appreciate the labor of love behind this project.

He led with a streamgraph, which presents a quick overview of relative party strengths over time.

Stephentaylor_canadianelections_streamgraph

I am no Canadian election expert, and I did a bare minimum of research in writing this blog. From this chart, I learn that:

  • the Canadians have an irregular election schedule
  • Canada has a two party plus breadcrumbs system
  • The two dominant parties are Liberals and Conservatives. The Liberals currently hold just less than half of the seats. The Conservatives have more than half of the seats not held by Liberals
  • The Conservative party (maybe) rebranded as "progressive conservative" for several decades. The Reform/Alliance party was (maybe) a splinter movement within the Conservatives as well.
  • Since the "width" of the entire stream increased over time, I'm guessing the number of seats has expanded

That's quite a bit of information obtained at a glance. This shows the power of data visualization. Notice Stephen didn't even have to include a "how to read this" box.

The streamgraph form has its limitations.

The feature that makes it more attractive than an area chart is its middle anchoring, resulting in a form of symmetry. The same feature produces erroneous intuition - the red patch draws out a declining trend; the reader must fight the urge to interpret the lines and focus on the areas.

The breadcrumbs are well hidden. The legend below discloses that the Green Party holds 3 seats currently. The party has never held enough seats to appear on the streamgraph though.

The bars showing proportions in the legend is a very nice touch. (The numbers appear messed up - I have to ask Stephen whether the seats shown are current values, or some kind of historical average.) I am a big fan of informative legends.

***

The next featured chart is a dot plot of polling results since 2020.

Stephentaylor_canadianelections_streamgraph_polls_dotplot

One can see a three-tier system: the two main parties, then the NDP (yellow) is the clear majority of the minority, and finally you have a host of parties that don't poll over 10%.

It looks like the polls are favoring the Conservatives over the Liberals in this election but it may be an election-day toss-up.

The purple dots represent "PPC" which is a party not found elsewhere on the page.

This chart is clear as crystal because of the structure of the underlying data. It just amazes me that the polls are so highly correlated. For example, across all these polls, the NDP has never once polled better than either the Liberals or the Conservatives, and in addition, it has never polled worse than any of the small parties.

What I'd like to see is a chart that merges the two datasets, addressing the question of how well these polls predicted the actual election outcomes.

***

The project goes very deep as Stephen provides charts for individual "ridings" (perhaps similar to U.S. precincts).

Here we see population pyramids for Vancouver Center, versus British Columbia (Province), versus Canada.

Stephentaylor_canadianelections_riding_populationpyramids

This riding has a large surplus of younger people in their twenties and thirties. Be careful about the changing scales though. The relative difference in proportions are more drastic than visually displayed because the maximum values (5%) on the Province and Canada charts are half that on the Riding chart (10%). Imagine squashing the Province and Canada charts to half their widths.

Analyses of income and rent/own status are also provided.

This part of the dashboard exhibits a problem common in most dashboards - they present each dimension of the data separately and miss out on the more interesting stuff: the correlation between dimensions. Do people in their twenties and thirties favor specific parties? Do richer people vote for certain parties?

***

The riding-level maps are the least polished part of the site. This is where I'm looking for a "how to read it" box.

Stephentaylor_canadianelections_ridingmaps_pollwinner

It took me a while to realize that the colors represent the parties. If I haven't come in from the front page, I'd have been totally lost.

Next, I got confused by the use of the word "poll". Clicking on any of the subdivisions bring up details of an actual race, with party colors, candidates and a donut chart showing proportions. The title gives a "poll id" and the name of the riding in parentheses. Since the poll id changes as I mouse over different subdivisions, I'm wondering whether a "poll" is the term for a subdivision of a riding. A quick wiki search indicates otherwise.

Stephentaylor_canadianelections_ridingmaps_donut

My best guess is the subdivisions are indicated by the numbers.

Back to the donut charts, I prefer a different sorting of the candidates. For this chart, the two most logical orderings are (a) order by overall popularity of the parties, fixed for all ridings and (b) order by popularity of the candidate, variable for each riding.

The map shown above gives the winner in each subdivision. This type of visualization dumps a lot of information. Stephen tackles this issue by offering a small multiples view of each party. Here is the Liberals in Vancouver.

Stephentaylor_canadianelections_ridingmaps_partystrength

Again, we encounter ambiguity about the color scheme. Liberals have been associated with a red color but we are faced with abundant yellow. After clicking on the other parties, you get the idea that he has switched to a divergent continuous color scale (red - yellow - green). Is red or green the higher value? (The answer is red.)

I'd suggest using a gray scale for these charts. The hardest decision is going to be the encoding between values and shading. Should each gray scale be different for each riding and each party?

If I were to take a guess, Stephen must have spent weeks if not months creating these maps (depending on whether he's full-time or part-time). What he has published here is a great start. Fine-tuning the issues I've mentioned may take more weeks or months more.

****

Stephen is brave and smart to send this project for review. For one thing, he's got some free consulting. More importantly, we should always send work around for feedback; other readers can tell us where our blind spots are.

To read more, start with this post by Stephen in which he introduces his project.


Ranking data provide context but can also confuse

This dataviz from the Economist had me spending a lot of time clicking around - which means it is a success.

Econ_usaexcept_hispanic

The graphic presents four measures of wellbeing in society - life expectancy, infant mortality rate, murder rate and prison population. The primary goal is to compare nations across those metrics. The focus is on comparing how certain nations (or subgroups) rank against each other, as indicated by the relative vertical position.

The Economist staff has a particular story to tell about racial division in the US. The dotted bars represent the U.S. average. The colored bars are the averages for Hispanic, white and black Americans. The wider the gap between the colored bars, the more variant is the experiences between American races.

The chart shows that the racial gap of life expectancy is the widest. For prison population, the U.S. and its racial subgroups occupy many of the lowest (i.e. least desirable) ranks, with the smallest gap in ranking.

***

The primary element of interactivity is hovering on a bar, which then highlights the four bars corresponding to the particular nation selected. Here is the picture for Thailand:

Econ_usaexcept_thailand

According to this view of the world, Thailand is a close cousin of the U.S. On each metric, the Thai value clings pretty near the U.S. average and sits within the range by racial groups. I'm surprised to learn that the prison population in Thailand is among the highest in the world.

Unfortunately, this chart form doesn't facilitate comparing Thailand to a country other than the U.S as one can highlight only one country at a time.

***

While the main focus of the chart is on relative comparison through ranking, the reader can extract absolute difference by reading the lengths of the bars.

This is a close-up of the bottom of the prison population metric:

Econ_useexcept_prisonpop_bottomThe length of each bar displays the numeric data. The red line is an outlier in this dataset. Black Americans suffer an incarceration rate that is almost three times the national average. Even white Americans (blue line) is imprisoned at a rate higher than most countries around the world.

As noted above, the prison population metric exhibits the smallest gap between racial subgroups. This chart is a great example of why ranking data frequently hide important information. The small gap in ranking masks the extraordinary absolute difference in incareration rates between white and black America.

The difference between rank #1 and rank #2 is enormous.

Econ_useexcept_lifeexpect_topThe opposite situation appears for life expectancy. The life expectancy values are bunched up especially at the top of the scale. The absolute difference between Hispanic and black America is 82 - 75 = 7 years, which looks small because the axis starts at zero. On a ranking scale, Hispanic is roughly in the top 15% while black America is just above the median. The relative difference is huge.

For life expectancy, ranking conveys the view that even a 7-year difference is a big deal because the countries are tightly bunched together. For prison population, ranking shows the view that a multiple fold difference is "unimportant" because a 20-0 blowout and a 10-0 blowout are both heavy defeats.

***

Whenever you transform numeric data to ranks, remember that you are artificially treating the gap between each value and the next value as a constant, even when the underlying numeric gaps show wide variance.

 

 

 

 

 


Stumped by the ATM

The neighborhood bank recently installed brand new ATMs, with tablet monitors and all that jazz. Then, I found myself staring at this screen:

Banknote_picker_us

I wanted to withdraw $100. I ordinarily love this banknote picker because I can get the $5, $10, $20 notes, instead of $50 and $100 that come out the slot when I don't specify my preference.

Something changed this time. I find myself wondering which row represents which note. For my non-U.S. readers, you may not know that all our notes are the same size and color. The screen resolution wasn't great and I had to squint really hard to see the numbers of those banknote images.

I suppose if I grew up here, I might be able to tell the note values from the figureheads. This is an example of a visualization that makes my life harder!

***
I imagine that the software developer might be a foreigner. I imagine the developer might live in Europe. In this case, the developer might have this image in his/her head:

Banknote_picker_euro

Euro banknotes are heavily differentiated - by color, by image, by height and by width. The numeric value also occupies a larger proportion of the area. This makes a lot of sense.

I like designs to be adaptable. Switching data from one country to another should not alter the design. Switching data at different time scales should not affect the design. This banknote picker UI is not adaptable across countries.

***

Once I figured out the note values, I learned another reason why I couldn't tell which row is which note. It's because one note is absent.

Banknote_us_2

Where is the $10 note? That and the twenty are probably the most frequently used. I am also surprised people want $1 notes from an ATM. But I assume the bank knows something I don't.


Dreamy Hawaii

I really enjoyed this visual story by ProPublica and Honolulu Star-Advertiser about the plight of beaches in Hawaii (link).

The story begins with a beautiful invitation:

Propublica_hawaiibeachesfrontimage

This design reminds me of Vimeo's old home page. (It no longer looks like this today but this screenshot came from when I was the data guy there.) In both cases, the images are not static but moving.

Vimeo-homepage

The tour de force of this visual story is an annotated walk along the Lanikai Beach. Here is a snapshot at one of the stops:

Propublica_hawaiibeaches_1368MokuluaDr_small

This shows a particular homeowner who, according to documents, was permitted to rebuild a destroyed seawall even though officials were supposed to disallow reconstruction in order to protect beaches from eroding. The property is marked on the map above. The image inside the box is a gif showing waves smashing the seawall.

As the reader scrolls down, the image window runs through a carousel of gifs of houses along the beach. The images are synchronized to the reader's progress along the shore. The narrative makes stops at specific houses at which point a text box pops up to provide color commentary.

***

The erosion crisis is shown in this pair of maps.

Propublica_hawaiibeaches_oldnewshoreline-sm

There's some fancy work behind the scenes to patch together images, and estimate the boundaries of th beaches.

***

The following map is notable for its simplicity. There are no unnecessary details and labels. We don't need to know the name of every street or a specific restaurant. Removing excess details makes readers focus on the informative parts. 

Propublica_hawaiibeaches_simplemap-sm

Clicking on the dots brings up more details.

***

Enjoy the entire story here.


Election visuals 4: the snake pit is the best election graphic ever

This is the final post on the series of data visualization deployed by FiveThirtyEight to explain their election forecasting model. The previous posts are here, here and here.

I'm saving the best for last.

538_snakepit

This snake-pit chart brings me great joy - I wish I came up with it!

This chart wins by focusing on a limited set of questions, and doing so excellently. As with many election observers, we understand that the U.S. presidential election will turn on so-called "swing states," and the candidates' strength in these swing states are variable, as the name suggests. Thus, we like to know which states are in play, and within these states, which ones are most unpredictable.

This chart lines up all the states from the reddest of red up top to the bluest of blue at the bottom. Each state is ranked by the voting margin predicted by 538's election forecasting model. The swing states are found in the middle.

Since each state confers a fixed number of electoral votes, and a candidate must amass 270 to win, there is a "tipping" state. In the diagram above, it's Pennsylvania. This pivotal state is neatly foregrounded as the one crossing the line in the middle.

The lengths of the segments correspond to the number of electoral votes and so do not change with the data. What change are the sequencing of the segments, and the color shading.

This data visualization is a gem of visual story-telling. The form lends itself to a story.

***

The snake-pit chart succeeds by not doing too much. There are many items that the chart does not directly communicate.

The exact number of electoral votes by state is not explicit, nor is it easy to compare the lengths of bending segments. The color scale for conveying the predicted voting margins is crude, and it's not clear what is the difference between a deep color and a light color. It's also challenging to learn the electoral vote split; the actual winning margin is not even stated.

The reality is the average reader doesn't care. I got everything I wanted from the chart, and I ain't got the time to explore every state.

There is a hover-over effect that reveals some of the additional information:

538_snakepitchart_detail

One can keep going on. I have no idea how the 40,000 scenarios presented in the other graphics in this series have been reduced to the forecast shown in the inset. But again, those omissions did not lessen my enjoyment. The point is: let your graphics breathe.

***

I'm thinking of potential variations even though I'm fully satisfied with this effort.

I wonder if the color shading should be reversed. The light shading encodes a smaller voting margin, which indicates a tighter race. But our attention is typically drawn first to the darker shades. If the shading scheme is reversed, the color should be described as how tight the race is.

I also wonder if a third color (purple) should be introduced. Doing so would require the editors to make judgment calls on which set of states are swing states.

One strange thing about election day is the specific sequence of when TV stations (!) call the state results, which not only correlates with voting margin but also with time zones. I wonder if the time zone information can be worked into the sequencing of segments.

Let me know what you think of these ideas, or leave your own ideas, in the comments below.

***

I have already praised this graphic when it first came out in 2016. (link)

A key improvement is tilting the chart, which avoids vertical state labels.

The previous post was written around election day 2016. The snake pit further cements its status as a story-telling device. As states are called, they are taken out of the picture. So it works very well as a dynamic chart on election day.

I'm nominating this snake-pit chart as the best election graphic ever. Kudos to the FiveThirtyEight team.