Visual analogies
Sociology of numbers

Space and time


When it comes to space or time in graphics, old habits die hard.   When we have spatial data, the default is to put it on a map.  When we have a time series, the default is to plot time along the horizontal axis.  Sometimes, these defaults work; other times, breaking up the map or straight-time-line works better.

Thanks to a reader, I noticed that Google put up a "Flu Trends" website to help us track the flu season.  They use two main charts to plot the data, as shown below.

Google_flu0


Google_flu_time1 On the right side is the
time series, showing the severity of flu cases from month to month.  There are many great things about this chart and one serious flaw.  I love the fact that they did not plot time on the horizontal axis; they realize the seasonality and they create overlapping lines.  They make good use of foreground and background; it's easy for us to compare year to year differences.

The serious flaw: no vertical scale.  This was a problem with Google Trends from day one (see my post here).  They still haven't fixed it.  Because of this, we don't know if the peak shown was 5 cases or 5000 cases.  While for Google key word searches, one can excuse them for trying to protect commercial secrets.  I would imagine that this public health data is, well, public.  Since the apparent purpose of this chart is to allow citizens to declare a flu epidemic (say, when they see the current trend depart from the historical norm), not having the scale is a huge problem.

Google_flu_time2 I also disagree with shifting the months around for the Northern Hemisphere so that the peaks of the graphs are aligned towards the middle.  It is better for the peaks to appear on the left and let the order of the months conform to our expectation.  (The "peak" would be split on the sides and the chart would look like a valley, which presumably is why they did it this way.)



Google_flu_aust The charts on the left side plot the spatial data, not surprisingly on maps.  Sadly, the standard exhibited on the time-series charts is nowhere found on these maps.

First, the legend is seriously deficient.

Second, the gradation of the colors is not fine enough, or put differently, the aggregation to the state/province level is much too coarse for any interesting pattern to be seen.


This poorly aggregated map becomes a farce when applied to the U.S.  There is not much left to be said, is there? 





Google_flu_us

Comments

Tom

I believe what they're plotting are keyword searches, not cases of flu as you suggest.

"We've found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate flu activity"

so I guess what the graphs are showing is public interest in flu which, they argue, usually correlates pretty stongly with actual cases of flu. I wonder how this relationship will hold up in more less predictable media conditions we're seeing during the current pandemic.

Otherwise I totally agree though. I think the maps really show the weakness of automatically generating graphics from a generic toolkit rather than crafting them to be most appropriate to the task of illuminating particular data. Having said that, I'm sure the technology can and will be refined to address the criticisms.

jason

Tom, that exactly what it is. There's been a good relationship in the past between searches and cases ("Detecting influenza epidemics using search engine query data", www.nature.com/nature/journal/v457/n7232/full/nature07634.html), but you're right in that it be interesting to see whether there's enough of a distortion from the publicity this time around.

Alex Cook

Hi,

As stated by Tom, it's google searches that's being used here, not cases, so the y-scales would be totally different. As a consequence, I think it makes sense not to put a scale on the y-axis. In any case, since the current outbreak is a pandemic to which most/all people have no immunity, there are going to be far more people infected (say, 3 times) this year than usual. So the plot would look silly even if it were cases that were being used.

You mention that the case data should be available to inform the public. In an ideal world, yes, but in many countries case data may be mandated by government and hence patients don't provide informed consent (yes, to being included in a large aggregated number). So it might not be legally possible for the data to be provided by health departments, and what google are doing is an excellent alternative.

Tom Carden

I think Google are using search terms to indicate flu activity, so that's why the y-axis remains unlabeled. It's still search trends, not number of cases.

Sean Carmody

There's quite a big of Swine Flu data here, but since many areas are no longer consistently testing for swine flu, except in cases of hospitalisation, the data should be used with care.

Kaiser

They are correlating search terms and not cases as indicated by several readers.

Alex and Tom - unlabeling any axis is a fatal error in graphing. Think about maps with no scales. This is how we end up walking for hours thinking that the two points look "close together" on the map. So I find it mystifying that Google insists on omitting the vertical scale.

The comments to this entry are closed.