Lost in translation

Since English is my second language, I have always been intrigued by automatic translation.  My "Turing" test for translation engines is to feed the translated output back into the same engine in the opposite direction.

Case in point: the first sentence of this post is translated by Babelfish into Italian -

Poiché l'inglese è la mia seconda lingua, sono stato incuriosito sempre tramite la traduzione automatica.

Now, Babelfish translates the above Italian text into English, as:

Since English is my second language, has been made curious always through the automatic translation.

Not that bad, really.


The tag line of this blog is "recycling chartjunk into junk art".  What happens in the other direction?  The answer is on this page!

This entry is inspired by Michael M.



Tricks of the trade 2

In a previous post, I explained the value of sketching when creating graphs. Today, I will share a few other graphs that plot the same data as we discussed the other day, regarding the proportion of time spent on developing different modules of software.

A stacked column chart, suggested by John J., would look like this:
Redo_wufoo3

Compared to the profile chart, this chart has some weaknesses:

  • it's difficult to read off the proportions for middle blocks like Blinksale-Billing;
  • because the middle blocks "float", it is impossible to compare them properly;
  • it requires as many colors as there are variables.

These problems get worse as the data scale: more difficult to read off the data; more colors needed.

The Merrimecko, suggested by Bernard L., is the same chart as above except that the widths of the columns are made proportional to the relative number of lines of code.  However, because these four companies do not make up the entire universe, proportional width make little sense here.

The profile chart can be drawn up in two ways:
Redo_wufoo2
These charts typically display results of cluster analysis.  This is a statistical data mining technique which discovers groups of like objects within a large data set.  Often times, the computer will only tell you these 15 belong to Cluster 1, those 22 form Cluster 2, etc. 

To figure out why the 15 belong together, the analyst needs to plot the explanatory variables against cluster index.  Now, think of WuFoo, FeedBurner, etc. as clusters, and the proportion of code given to Application, etc. as variables.

While the line segments don't signify anything real, they trace out  the precise paths our eyes would take when reading the stacked column chart above!  Remember we wanted to compare the number of lines given to each function across companies.  If shown the column chart, my eyes would flip across the top of the  Application (blue) blocks from WuFoo to regonline.  This path is exactly the brown line on our first profile chart.

The numbers for Marketing, Support and Billing are much easier to read too as they all start from zero for each company.

The right chart is another possibility but for this particular situation, I prefer the left one.

Finally, I am less familiar with the "parallel coordinates plot" that Derek talked about.  I believe it is a variant of the profile chart.