« Rising bankruptcies and home prices | Main | Poll results and "Alabama first!" »

Comments

Hadley

Is that a bar chart or a histogram? I think of a histogram as approximating a continuous density - it doesn't display the raw data, eg. as the bin width and offset can substantially change the image

Kaiser

Indeed histograms are usually plotted for continuous variables, then you get into bin widths and kernels and so on.

This is like a histogram for categorical variables. If I remove the white space between the bars, and turn the chart 90 degrees, it would look like a histogram.

Bin width is also a relevant concept here if we modify the interpretation a bit. A key issue (which I avoided bringing up in the post) is that of relevance; in all likelihood, we have a long-tailed distribution so that the most popular keywords are thousands of times more frequent than the rare keywords but there would be thousands of rare keywords. The decision then is how many categories should be combined into one bar so that there is optimal smoothing.

I'll have to do a separate post on this with some histograms to make this clear.

Hadley

There is another distinction, being that there is no canonical ordering in the categorical case. Histograms are not ordered by frequency. I'd argue that histograms are a special case of barcharts, and it's confusing to call the plots above histograms.

I also wonder if the numbers in the frequency table underlying the flickr tag cloud aren't transformed in some way to avoid very popular tags overwhelming the display.

Kaiser

You certainly have a point here. I was back and forth about it myself; you might have noticed that histogram was not mentioned in the main text. That's why.

And they may well have log-transformed the frequencies. It'd make sense.

In fact, there is only a limited number of font sizes and so they must have bucketed the frequencies, which means that the relative font size would have little to do with relative frequency!

The comments to this entry are closed.

BOOTCAMP SUMMER '19



Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Keep in Touch

follow me on Twitter

Residues