« Rising bankruptcies and home prices | Main | Poll results and "Alabama first!" »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef00d83491102169e2

Listed below are links to weblogs that reference Tag clouds are histograms:

Comments

Hadley

Is that a bar chart or a histogram? I think of a histogram as approximating a continuous density - it doesn't display the raw data, eg. as the bin width and offset can substantially change the image

Kaiser

Indeed histograms are usually plotted for continuous variables, then you get into bin widths and kernels and so on.

This is like a histogram for categorical variables. If I remove the white space between the bars, and turn the chart 90 degrees, it would look like a histogram.

Bin width is also a relevant concept here if we modify the interpretation a bit. A key issue (which I avoided bringing up in the post) is that of relevance; in all likelihood, we have a long-tailed distribution so that the most popular keywords are thousands of times more frequent than the rare keywords but there would be thousands of rare keywords. The decision then is how many categories should be combined into one bar so that there is optimal smoothing.

I'll have to do a separate post on this with some histograms to make this clear.

Hadley

There is another distinction, being that there is no canonical ordering in the categorical case. Histograms are not ordered by frequency. I'd argue that histograms are a special case of barcharts, and it's confusing to call the plots above histograms.

I also wonder if the numbers in the frequency table underlying the flickr tag cloud aren't transformed in some way to avoid very popular tags overwhelming the display.

Kaiser

You certainly have a point here. I was back and forth about it myself; you might have noticed that histogram was not mentioned in the main text. That's why.

And they may well have log-transformed the frequencies. It'd make sense.

In fact, there is only a limited number of font sizes and so they must have bucketed the frequencies, which means that the relative font size would have little to do with relative frequency!

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues