Tricks of the trade 2
Jun 29, 2007
In a previous post, I explained the value of sketching when creating graphs. Today, I will share a few other graphs that plot the same data as we discussed the other day, regarding the proportion of time spent on developing different modules of software.
A stacked column chart, suggested by John J., would look like this:
Compared to the profile chart, this chart has some weaknesses:
- it's difficult to read off the proportions for middle blocks like Blinksale-Billing;
- because the middle blocks "float", it is impossible to compare them properly;
- it requires as many colors as there are variables.
These problems get worse as the data scale: more difficult to read off the data; more colors needed.
The Merrimecko, suggested by Bernard L., is the same chart as above except that the widths of the columns are made proportional to the relative number of lines of code. However, because these four companies do not make up the entire universe, proportional width make little sense here.
The profile chart can be drawn up in two ways:
These charts typically display results of cluster analysis. This is a statistical data mining technique which discovers groups of like objects within a large data set. Often times, the computer will only tell you these 15 belong to Cluster 1, those 22 form Cluster 2, etc.
To figure out why the 15 belong together, the analyst needs to plot the explanatory variables against cluster index. Now, think of WuFoo, FeedBurner, etc. as clusters, and the proportion of code given to Application, etc. as variables.
While the line segments don't signify anything real, they trace out the precise paths our eyes would take when reading the stacked column chart above! Remember we wanted to compare the number of lines given to each function across companies. If shown the column chart, my eyes would flip across the top of the Application (blue) blocks from WuFoo to regonline. This path is exactly the brown line on our first profile chart.
The numbers for Marketing, Support and Billing are much easier to read too as they all start from zero for each company.
The right chart is another possibility but for this particular situation, I prefer the left one.
Finally, I am less familiar with the "parallel coordinates plot" that Derek talked about. I believe it is a variant of the profile chart.