« Dizzy display | Main | Adulterated education »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef00e00989b1328833

Listed below are links to weblogs that reference Tricks of the trade 2:

» Great Blog from Great Blog
far out dude [Read More]

Comments

Hadley Wickham

Profile charts are just another name for the parallel coordinates plot (and I believe that pcp is the more "statistical" term, for what that's worth)

derek

Basically, it's what you just described: an exploratory data analysis (EDA) device for comparing the properties (on an interval scale) of a group of objects (on a nominal scale) using lines.

As I said, I used to think lines were a no-no unless there was a clear sequence from object to object (i.e. must be an ordinal or interval scale) but now I personally downgrade that rule to a weak guideline; weaker, for example, than the "rule" which says the interval scale of a line chart must start at zero.

derek

Sorry, that was incoherent: the objects are the lines, one line per object. The properties are the nominal categories, and the values of those properties are the nominal scale.

I would also note that each property may be expressed in different units, or if the same units, have a different scale. They are each scaled to use the vertical space fully, from top to bottom (or top at least, if starting from the zero). The vertical scales may not be supplied, as each property would require its own. Usually they are omitted, and you are expected to concentrate on the distribution: is this point far from the pack, and if so, what other points on the line are also far from the pack?

The number of properties is moderate, but usually larger than four, and the number of objects is usually much larger than four. The lines are not individually identified by colour, but are uniform. You are expected to pick out the unusual ones by eye, and apply some colour to those: the rest remain a mess of (e.g.) gray lines.

In the example above, the properties are the lines, and the objects are on the nominal scale, making this not the same as a parallel coordinates plot. Because of this, it is necessary for the properties to share the same scale. If not, each scale must be visibly supplied and labelled.

dermot

actually, I prefer a stacked bar chart,sorted so the size of the bars runs from high to low (or vice versa).

The lines above don't work well when the items they connect are not trend lines, eg time series.

derek

Stephen Few's PDF article on parallel coordinates and their use in business intelligence here.

These are going to be rare at Junk Charts for several reasons: first, they are more often used as an interactive display for exploration than a static display for presentation; second, they are typically high-dimensional data sets using large numbers of objects, which are rarely displayed in the media where they can be criticised and given the junk art treatment; and thirdly, if they are, it's unusual for the data to be separately available, and impractical to reconstruct by hand from the graph itself.

Finally, my version of Kaiser's line chart done as parallel coords, with each line being a startup co., and each category on the x axis being a spend purpose.

I don't want to go on about them too much: I was just using them as an example of the sort of display that convinced me lines weren't verboten for nominal x-axes.

derek

Damn, I just realised Kaiser already did one.

Jon Peltier

Sketching is important. You need to slice and dice the data several ways before a clear (or at least clearer) picture of the behavior emerges.

While a stacked column is not ideal for final presentation, it is useful as an initial sketch. In addition to changing chart type, one needs to rearrange the order of series and categories. For example, Kaiser and Derek each produced parallel coord/profile charts with Blinksale making a uniformly sloped line (albeit in opposite directions). I might have reordered Kaiser's first profile chart to put blinksale either far left or far right on the horizontal axis.

Another thing you often discover while sketching the relationships is that it might take two or three charts to clearly display all aspects of the data. I think that Kaiser's second profile chart is very effective, but in conjunction with the first, it gives a better picture.

Many people try to get two charts worth of information in a single chart using secondary axes and combinations of different chart types and such, but two adjacent charts are more quickly understood.

John Johnson

If the application category is the most important, then it may not be a problem if the middle bars float. But four categories is probably the maximum that I would use for the display.

Darius K.

I have to agree with dermot: I don't believe in using line graphs when you're not plotting a series on the x-axis.

zbicyclist

I'm with Darius. A line graph for nominal categories is bad practice.

Of what's been shown, I like the stacked bar the best.

Hadley Wickham

Another option for those who vehemently dislike the lines, is to do a set of small multiples (aka trellises aka co-plots) of bars for each of the companies.

Some demonstration of the interactive use of parallel coordinate plots is available on the GGobi site. Interaction is highly software specific, and there are few features that are implemented by all interactive pcp software.

Derek: I don't believe that the type of scaling influences whether a plot is a parallel coordinates plot or not. The key feature is the parallel axes - ie. you are drawing a projective coordinate system rather than the more usual Cartesian coordinate system. Both of the above examples are pcps.

Jon Peltier

"A line graph for nominal categories is bad practice."

This generally is true, but Kaiser is not drawing a line chart per se, but a parallel coordinate plot. The lines are understood not to convey a trend but merely to connect related points.

"I prefer a stacked bar chart."

The problems here are at least twofold.
1. Since points in the same category are stacked end-to-end rather than side-by-side, it is difficult to compare their relative values.
2. Since points in the same series in different stacks have different baselines (i.e., the tops of the bars they are stacked upon), it is difficult to compare their relative values.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues