Degrees of likeness 1
Bubble after bubble

Degrees of likeness 2

We left off the other day with an interactive graphic with the ability to peer into subgroups.  This feature assumes implicitly that the overall average obscures differences within subgroups.  What statisticians do with this type of data is to compare the subgroups, and identify the factors that make someone different from the average.


For example, there is a clear distinction between the employed and the unemployed in how they spend the day (not surprising).

Nyt_timeuse_employement
This happens to be what NYT printed in the paper edition that day.  (Note, though, that the graphic loses quite a bit without the interactivity.)

On the other hand, there appears to be little differentiation between men and women.

Nyt_timeuse_gender
Nor is there much difference between blacks and whites.

Nyt_timeuse_race
One factor that matters is age.  Older people are not exactly like the young.  A lot of these factors (for example, age and employment status) are correlated, by the way.

Nyt_timeuse_age

I showed all these in order to talk about the statistical concept of "aggregation".  We noted that the distribution of time use of the employed is different from that of the unemployed.  Thus, we cannot use the "average" distribution to describe both groups, and so we show the data in disaggregated form.  Similarly for time use and age.  

But there is not much gain in disaggregating race and gender: the "average" is representative of the subgroups for these two factors.  This is one distinction I see between information graphics and statistical graphics: the former typically shows all possible subgroups while in the latter, the designer zooms in on the factors that matter.




Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mike Lawrence

On the topic of aggregation, a neat story in the psychology of learning can be found in the "power law" of practice. It turns out that practice actually follows an exponential function, but naive aggregation of individual data yields a (misleading) power function (http://en.wikipedia.org/wiki/Power_Law_of_Practice).

The comments to this entry are closed.