Multicultural, multicolor, manufactured outrage
What do I think about spirals?

Dots, lines, and 2D histograms

Daniel Z. tweeted about my post from last week. In particular, he took a deeper look at the chart of energy demand that put all hourly data onto the same plot, originally published at the StackOverflow blog:


I noted that this is not a great chart particularly since what catches our eyes are not the key features of the underlying data. Daniel made a clearly better chart:


This is a dot plot, rather than a line chart. The dots are painted in light gray, pushed to the background, because readers should be looking at the orange line. (I'm not sure what is going on with the horizontal scale as I could not get the peaks to line up on the two charts.)

What is this orange line? It's supposed to prove the point that the apparent dark band seen in the line chart does not represent the most frequently occurring values, as one might presume.

Looking closer, we see that the gray dots do not show all the hourly data but binned values.

We see vertical columns of dots, each representing a bin of values. The size of the dots represents the frequency of values of each bin. The orange line connects the bins with the highest number of values.

Daniel commented that

"The visual aggregation doesn't in fact map to the most frequently occurring values. That is because the ink of almost vertical lines fills in all the space between start and end."

Xan Gregg investigated further, and made a gif to show this effect better. Here is a screenshot of it (see this tweet):


The top chart is a true dot plot so that the darker areas are denser as the dots overlap. The bottom chart is the line chart that has the see-saw pattern. As Xan noted, the values shown are strangely very well behaved (aggregated? modeled?) - with each day, it appears that the values sweep up and down consistently.  This means the values are somewhat evenly spaced on the underlying trendline, so I think this dataset is not the best one to illustrate Daniel's excellent point.

It's usually not a good idea to connect lots of dots with a single line.


[P.S. 3/21/2022: Daniel clarified what the orange line shows: "In the posted chart, the orange line encodes the daily demand average (the mean of the daily distribution), rounded, for displaying purposes, to the closed bin. Bin size = 1000. Orange could have encode the daily median as well."]



Feed You can follow this conversation by subscribing to the comment feed for this post.


"..strangely very well behaved..". People and business are creatures of habit. This perfectly looking data. I do research in this field.


Joe: Thanks for the expert perspective. Would you say it depends on the level of aggregation? Is it still well behaved at the minute level? (Wondering...)


@Joe, any detail resolution (minutes included) works, as long as the line resolution stays within acceptable visual limits (for this dataset 366 days/points is acceptable). You will have 24x60 values to calculate the histogram for, but the histogram will have the same bin size/density.

The dots size scale is given by the the maximum bin of all histograms, therefore is irrelevant how large that counter might be. Just reminding here that the vertical histograms bins are not calculated in time units (hours or minutes), but in electricity units (chosen bin was 1000). So they are not going to be denser, only the dots might have a more tuned/precise size, which is quite irrelevant considering their dimension.


yes. aggregating data hourly will smooth out the variation experienced at finer intervals.

The comments to this entry are closed.