In a Trifecta Checkup (link), the Vulture chart falls into Type DV. The question might be the relationship between running time and box office, and between Rotten Tomatoes Score and box office. These are very difficult to answer.
The box office number here refers to the lifetime gross ticket receipts from theaters. The movie industry insists on publishing these unadjusted numbers, which are completely useless. At the minimum, these numbers should be adjusted for inflation (ticket prices) and for population growth, if we are to use them to measure commercial success.
The box office number is also suspect because it ignores streaming, digital, syndication, and other forms of revenues. This is a problem because we are comparing movies across time.
You might have noticed that both running time and box office numbers have gone up over time. (That is to say, running time and box office numbers are highly correlated.) Do you think that is because moviegoers are motivated to see longer films, or because movies are just getting longer?
PS. [12/15/2014] I will have a related discussion on the statistics behind this data on my sister blog. Link will be active Monday afternoon.
If your chart is titled "The Most Popular TV Show Set in Every State," what would you expect the data to look like?
You'd think the list would be dominated by the hit shows like The Walking Dead and Downton Abbey, and you might guess that there are probably only four or five unique shows on the list.
But then it's easy to miss the word "set" in the title. They are looking for most popular show given that it is set in a particular state. Now this is a completely different question -- and conversely, it guarantees that there will be 50 different shows for the 50 states, assuming that one show can't be set in multiple states. This is also, computationally, a much more complex question. Some locations, like New York, Mass. (Boston), and Illinois (Chicago), are many times more likely to be the settings of TV shows than other states. This means, one might need to go back many years to find the "popular" shows in the less attention-grabbing states.
I used quotations for the word "popular" because if one has to dig deep into history for a specific state, then it is possible that the selected show would not be popular in the aggregate! This is not unlike the issue of whether having your kids pick up a popular sport (like basketball) or instrument (like violin) is better or worse than an unpopular one (like squash or trombone). The latter route is potentially the shorter to stand out but their achievement will be known only to the niche audience.
This brings me to how one should look at a map like this one in Business Insider (link):
The first thing that strikes you are the colors. The colors that signify nothing. Since each state has its own TV show, by definition each piece of information is unique. As far as I can tell, the choice of which states share the same color is totally up to the designer.
As I have remarked in the past, too often the designer uses the map as a lesson in geography. The only information presented to readers through the map type is where each state in located in the union. Without the state names, even this lesson is incomplete. We learn nothing about the relative popularity of these shows, the longevity, the years in which they went on air, etc.
Geographical data should not automatically be placed on a map.
Is there any "data" in this map? It depends on how you see it. Here's what the author described went into pairing each state with a TV show:
To qualify, we looked at television series as opposed to reality shows.* Selections were based on each show’s longevity, audience and critical acclaim using info from IMDB/Metacritic, awards, and lasting impact on American culture and television... *When there wasn't a famous enough series to choose from, we selected a more popular reality show. That happens once on this list (IA).
The most offensive aspect is the linear regression line. It's clearly an inappropriate model for this dataset.
I also don't like charts that include impossible values on the axis, in this case, the Rotten Tomato Score does not ever go above 100%.
If the chart is turned on its side, the movie titles can be read horizontally.
*** I am compelled by the story but the chart doesn't help at all. Of course, it would be better if they can find data on the profitability of each movie. Readers should ask how correlated the Rotten Tomato Score is with box office, and also, what are the relative costs of producing these different movies. Jon has the score against profit chart (link).
The reader must first read the beginning pages of the report to find one's bearing. The two charts are supposed to investigate the correlation between streaming video and regular TV. What causes the confusion is that the populations being analyzed are different between the two charts.
In the left chart, they exclude anyone who do not watch streaming video (35% of the sample), and then divide those who watch streaming video into five equal-sized segments based on how much they watch. Then, they look at how much regular TV each segment watches on average.
In the right chart, they exclude anyone who do not watch regular TV (just 0.5% of the sample), and then divide those who watch regular TV into five equal-sized segments based on how much they watch. Then, they look at how much online streaming video each segment watches on average.
What crosses us up is the relative scales. The scale for regular TV viewing is tightly clustered between 212 and 247 daily minutes on the left chart but has a wide range between 24 and 522 on the right chart. The impression given by the designer is that the same population (18-34 year olds) is divided into five groups (quintiles) for each chart, albeit using different criteria. It just doesn't make sense that the group averages do not match.
The reason for this mismatch is the hugely divergent rates of exclusion as described above. What the chart seems to be saying is that the 65% who use streaming video have very similar TV viewing behavior (about 220 daily minutes). In other words, we surmise that most of those people on the left chart map to groups 2 and 3 on the right chart.
Who are the people in groups 1, 4 and 5 on the right chart? It appears that they are the 35% who don't watch streaming video. Thus, the real insight of this chart is that there are two types of people who don't watch streaming video: those who watch very little regular TV at all, and those who watch twice the average amount of regular TV.
Here's another puzzle: Nielsen claims that high streaming = low TV and low streaming = high TV. Is it really true that high streaming = low TV? Take the segment of highest streaming (#1 on the left chart). This group, which is 13% of the survey population, accounts for 83% of the streaming minutes -- almost 71,000 out of 86,000 minutes. Now look at the right chart. It turns out that the streaming minutes are quite evenly distributed among those TV-based quintiles, ranging from 15,000 minutes to 23,000 minutes each.
So, it is impossible to fit all of the top streaming quintile into any one TV quintile - they have too many streaming minutes. In fact, the top streaming quintile must be quite spread out among the TV quintiles since each of the TV quintiles is 1.5 times the size of a streaming quintile!
So, we must conclude that customers who stream a lot include both fervent TV fans as well as those who watch little TV.
This graphic feature is the best from the NYT team yet. I particularly love the two columns on the right which allows us to see regional differences. For example, this "New in Town" movie was much popular in Minneapolis than any of the other metropolitan areas, and was particularly unwatched in New York. Also, note the choice of sorting allowed on the top right.
Here are some of my favorite links from other places:
A spatial journey illustrating a very long scale, created by the Genetic Science Learning Center (here)
Long scales are very difficult to deal with in charts; I have never been satisfied with log scales since it addresses the designer's challenge of trying to fit everything onto one page, bu does not deal with the reader's need to compare the elements accurately
Not sure how this helps but perhaps some of you will figure it out
Tommi left a comment about this conceptual chart by xkcd, which has been making the rounds. Fits into our Light Entertainment category.
Says there is no optimal chart type. A type that works very well for one data set may get hopelessly cluttered for another, similar data set.
From fellow bloggers (especially Jorge), a whole series of views of the U.S. unemployment figures by state over time. Alternatives that are much more interesting to look at than the typically line chart. Jorge even found something in Excel that looks good.
I share reader Bernard L.'s enthusiasm for this very imaginative chart, courtesy of the graphics people at NYT. The chart captures the ebb and flow of weekly movie receipts over the last two decades. The details that particularly interest me include:
The addition of area colors (on top of lines) serves to highlight box office successes; this really helps readers sort out the massive amount of data
Nicely spaced text (and dots) does not interfere with our reading of the chart
The hiding of text for less important films, plus taking advantage of interactivity to show their titles if the reader mouses over the respective areas
All of the above indicate a keen sense of foreground versus background. Besides, the authors had the good sense to speak of inflation-adjusted box office sales; I'm tired of the movie industry proclaiming higher sales each year when ticket prices are rising, and the population is growing.
This is another chart where more data do not easily translate into better communication (see my guest post at Flowing Data). While I like the playful nature of the interactive chart, it is left to the reader to discover the information buried in the data, such as the assertion in the header that Oscar-winning films typically take time to attain box-office success while many blockbusters do not Oscars make.
In this presentation, it is challenging to compare the total receipts of one film versus another (this requiring comparing oddly shaped, partially obscured areas). It is also hard to compare across years since the data is spread out over a lot of space.
There may really be two types of graphics: the one like the example here which is a dictionary and designed for exploration; and the other kind where the designer has selected a subset of the data to make a specific point.
Reference: "The ebb and flow of movies", New York Times, Feb 232008.
Business Week dissected the beneficiaries of the Oscar show as shown on the right. Although this doesn't work well as a data graphic, if thought as a variant on the data table, it is more engaging for readers.
Lets have some fun with the Oscar statue. First, putting a bar chart next to the statue confirms that the height of the segments (rather than the area) is in proportion to the dollar values (below left).
Tufte, Chambers and others have shown that our eyes react to the areas, not heights. So next, I estimated the areas but stretched them out into segments of equal width. Squeezing the entire column back down to the height of the statue, the following chart (below right) puts perceived proportions next to the true proportions, displaying visually the extent of distortion.
Reference: "News you need to know", Business Week, Jan 28 2008.