My talk next week on histograms
The curse of dimensions

What's a histogram?

Almost all graphing tools make histograms, and almost all dataviz books cover the subject. But I've always felt there are many unanswered questions. In my talk this Thursday in NYC, I'll provide some answers. You can reserve a spot here.

***

Here's the most generic histogram:

Salaries_count_histogram

Even Excel can make this kind of histogram. Notice that we have counts in the y-axis. Is this really a useful chart?

I haven't found this type of histogram useful ever, since I don't do analyses in which I needed to know the exact count of something - when I analyze data, I'm generalizing from the observed sample to a larger group.

Speaking of Excel, I felt that the developers have always hated histograms. Why is it much harder to make histograms than other basic charts?

***

Another question. We often think of histograms as a crude approximation to a probability density function (PDF). An example of a PDF is the famous bell curve. Textbooks sometimes show the concept like this:

Histogram_normal_pdf

This is true of only some types of histograms (and not the one shown in the first section!) Instead, we often face the following situation:

Normals_histogram50_undercurve

This isn't a trick. The data in the histogram above were generated by sampling the pink bell curve.

***

If you've used histograms, you probably also have run into strange issues. I haven't found much materials out there to address these questions, and they have been lingering in my mind, hidden, for a long time.

My Thursday talk will hopefully fill in some of these gaps.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.