A terrific application of tag clouds can be seen over at pollster.com, following the first debate of Democratic Presidential hopefuls the other night. Here is Senator Biden's "tag cloud", depicting the top 50 words that came out of his mouth that night. The size of each word is proportional to how often he uttered it.
Having not seen the debate, I can use this summary device to get a quick read on what his main points were. It's clear that he talked about the war ("Iraq", "troops"), education ("teachers", "students"), abortion ("roe", "wade" but interesting not the word "abortion"). Of course, if he had a distinct message, that would have been even better. For what the tag cloud exposed (assuming it was done right) was that he was pretty much all over the place, touching on many different things about equally often.
It is disconcerting that a word like "so-called" made it into the top 50. Better is "better" is his #1 word.
It is typical to process text-based data by removing all the most common words that do not carry real meaning (um, ur, the, so-called, etc.) but in this case, keeping them is helpful so the candidates can catch problems like the excessive use of "so-called".
However, the tag cloud would have been improved if "stemming" were used to collapse "talk" and "talking", "teacher" and "teachers", etc.
Pollster did tag clouds for every candidate. Comparing them provides even more insights! Here's one for Senator Clinton. Her message is much more focused, quite a lot of time spent proclaiming her "readiness" for "President", quite a bit on "healthcare" and quite a bit on the "war".
As Pollster correctly pointed out, it is unclear if the size of words could be compared across tag clouds. If so, the setup would be even more powerful.
The entire set of tag clouds can be seen here. Long-time readers of this blog will remember that we have advocated such use back in Jan 2006, when discussing the "concordance" feature at Amazon. This successful application validates our enthusiasm.