« Exemplary | Main | Disseminating junk »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef0112796cc52728a4

Listed below are links to weblogs that reference The trouble with maps:

Comments

John S.

On the other hand, the participant map is a nice illustration of population density across the country!

derek

Here's an example of small multiples not working as well as a single combined map would. I can see a difference, but not easily. It seems that 'carml' is more north central and 'carramel' is more northeast and south.

Part of the problem is that the squares are large and solid, so they overplot each other. Stephen Few has a paper on his web site discussing possible solutions. Here I'd recommend replacing the solid markers with hollow ones, to show more structure in the dense areas.

But a heat map showing the difference in percentages between 'carml' and 'carramel' would probably have been better, if that's what they want their readers to take away. The 'carml' predominant areas could be redder and the 'carramel' predominant areas could be bluer.

Plotting the markers together, and using a very smart choice of colours for the markers, might mimic this effect without losing the sense of population density as choropleth maps tend to.

Keelan Evanini

The biggest problem with this survey (from a dialectologist's standpoint) is that it doesn't control well for geographic mobility of the respondents. It is quite likely that many of the respondents filled out the survey while residing in a different dialect region than where they grew up. This adds substantial noise to the results, and makes them hard to interpret. For many of the maps in the survey, it is hard to know whether the perceived lack of geographical difference is due to an actual lack, or to this design flaw.

Some maps from the survey do show regional variation much more clearly than the 'caramel' one. E.g., check out this map for 'sub' vs. 'hoagie'. There you can clearly distinguish a 'grinder' cluster around Boston, a 'hoagie' cluster around Philadelphia, a 'poor boy' cluster around New Orleans, etc.

On the other hand, other features that we know to be clearly distinct geographically are misrepresented in this survey. For example, take the vowel sounds in 'cot' and 'caught'. We know from extensive research that people who grow up in certain areas of the country (e.g. New York, Chicago) will always pronounce them distinctly, whereas people growing up in other areas (e.g. Boston, Pittsburgh, Denver) will always pronounce them the same (some other areas are undergoing change). This map from the Atlas of North American English shows this pretty clearly. Contrast that map, however, to the corresponding map from the Vaux survey and you'll see that the geographical pattern completely disappears.

In my opinion, the data in the Vaux survey suffer from methodological flaws that will make the results very hard to interpret, even with more advanced graphical approaches than the ones they currently use.

Kaiser

Keelan: good points, and this really speaks to the need to interpret data with the right context.

What I am not getting is what each dot represents. Say each dot contains the samples in a particular county. Then how does that proportion then gets translated into a dot/no dot indicator?

Keelan Evanini

Kaiser: Each point on the map simply represents an individual participant in the survey. That's why the maps are so crowded in certain regions.

The results page for each survey item provides the raw numbers for each response. For example, if we consider the first map for 'caramel' (choice a, with red dots), the summary reports that 38.02% of respondents chose this response, so 4414 out of the 11609 who responded to this item. If you look at a question that has some responses with very low counts, such as milkshake, you can almost count each individual point accurately and match up the total with the numbers listed at the top (although this is not that easy to do, since dots from the same location overlap and the number of speakers is obscured).

The "Participant Map" shown at the bottom of the post claims that there were a total of 30788 respondents overall. However, if you look at the results pages for the individual survey items, you'll see that most of them got about 10500 - 11000 responses. This exposes another flaw in the survey's design: they didn't control for how much of the survey each participant completed. Based on the total count of 30788, and the individual item counts of ca. 11000, I would guess that there were about 20000 participants who signed up for the survey but only completed a few items. Other possibilities exist too (e.g., each of the 30788 participants filled out a third of the survey), but I'm pretty sure the survey items weren't randomized for each trial, so this seems the most likely explanation.

Bert Vaux

You might want to check out my most recent survey, which makes use of more up-to-date dynamic mapping technology:
www.ling.cam.ac.uk/survey/

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues