Apr 08, 2007

Peripherals 1

Like any technology, charts also come with peripherals: I'm talking about legends, data labels, grid-lines and so on.  These things typically give us the most trouble, especially with complex data sets.  The analogy is apt: one may feel inextricably knotted up like bunches of cords and wires.

Interactive graphics is a particularly elegant solution to this problem, and Google Finance has done a fantastic job leading the way.  One trick is to show the legend only when the user asks for it. 
Google_sectorsum_lgUsing bar charts (on the left), Google summarizes neatly the performance of stocks within each industry sector.  The bar chart gives a sense of the dispersion which adds to the average returns printed next to them.  For example, most sectors gained on average but then about 30% of the individual stocks in most sectors actually declined on that day.  So the fact that technology stocks gained 0.48% on average doesn't necessarily mean that the two tech stocks you own gained 0.48% or gained at all.

Typically, we would put a legend on the side or at the bottom of the chart, which all be told, is an ugly duckling next to a well-executed chart.  Here, the legend is hidden behind the "What's this?" link.  The side benefit is that the legend can be as verbose as needed since it doesn't interfere with the chart.

There are a few minor things to consider:

  • "What's this?" is not very informative: Why not call it a "legend" or "key"?
  • The graph designer seems to think that the most important information sought by readers was the extremes, i.e. the percentage of stocks that gained/lost more than 2%.  By darkening the sides of the bar, it draws attention away from the middle which is the boundary between the gainers and the losers.  I'd like to see that boundary delineated.
  • Similar to the above point, I'd sketch out a version which aligns the gainer/loser boundary to the middle so it's easy to see the balance between gainers and losers.  This version however would require more space
  • I'd provide sorting by average return, and by percentage of gainers

Mar 27, 2007

Illusory disparity

The WSJ published a chart with the cheeky title of "Rich Get Richer" (reminiscent of the Economist).  The underlying data concerned one-, three- and ten-year returns for the buyout fund category.  For each return class, the overall mean and the means for the top and bottom 25% funds were depicted.

I won't go into the relevance of the title as I simply could not figure out how it connected with the data.  The following shows the original chart side by side with the junkart version.

Redo_richgetricher

Improvements include:

  • Lines show the comparisons with a minimum of fuss compared with colored bars
  • The overall mean return is placed in the middle of each line segment where it belongs, instead of being the first column
  • The axis label, "annualized return", tells readers what is the performance measure
  • Adding the word "funds" to "top quartile" and "bottom quartile" removes the possible confusion that those represent individual returns of the funds ranked at 25th and 75th percentiles, rather than the average returns of the bottom 25% and top 25% of funds
  • The linear construct paints the correct picture that individual fund returns fall into a continuum

(Thanks to my students for some of these points.)

Reference: Wall Street Journal, Mar 3-4 2007.

Mar 21, 2007

Dot com bubbles

Web_dotcombubbles Thanks to Dustin J for the pointer as well as the title of this post.  Dotcom bubbles is the most appropriate name for this overblown chart (featured as the "chart of the day" here).

The chart has no title or axis labels so only the diligent reader will figure out that the data consist of acquisition value of several high-profile Internet companies in the past three years.

There are less data than it seems.  Both the heights and the areas of the bubbles indicate the same thing, the deal values.  If we are supposed to see a trend, we are not finding it.

Most of these deals are not directly comparable anyway.  Webex and Ironport are infrastructure type companies with real business models.  Skype is a phone service.  Ask Jeeves is not a leader in its own space. Myspace and YouTube are traffic sites.

Reference: "Chart of the Day: Web deals", Valleywag, Mar 15 2007.

Mar 17, 2007

Picking up the right file

The Institutional Investor advises its readers:

Going public may just be the most important -- and nerve racking -- decision any company will make.  Managing and pricing an IPO is tricky, so picking the right underwriter is crucial.  Bankers often boast of their league table prowess to win mandates, but quantity does not necessarily mean quality.

By quantity, they meant the amount of underwriting fees (revenues) earned; and by quality, the average stock performance of the newly-public companies, as of Feb 16, 2007.

Ten banks were compared on the two Qs using this chart, which is best described as the "file folder chart".

Iporanks

Amusingly, its creator sized the height of each file according to the quality metric, which is the return % listed at the top right corner of each file.  The files were sorted by decreasing quality.  Since each file is a parallelogram, its area is proportional to quality.

However, the files overlap, preventing us from comparing the areas of the files.  Besides, the point made in the article about the importance of both Qs is lost since this chart stressed quality over quantity.  Quantity showed up as a low dot on the tallest file and a high dot on the shortest file.

Redo_iporanks The junkart version restores the balance.  The blue lines highlighted several banks that scored high on one metric but low on the other.  The construct is a profile chart, with only two variables.

Curious readers may wonder if there were only 10 banks in the IPO underwriting market.  Far from it.  The chart designer introduced a selection bias because banks were included based on Quantity, and then Quality was rated.  This meant there is possibly a boutique firm with small revenues but higher quality than any of the 10 in the plot.

Furthermore, much useful information is missing, including the dispersion of returns, the number of deals, etc.

Reference: "Grading the IPO Underwriters", Institutional Investor, March 2007.

Feb 15, 2007

The sum and the parts

Over the last few years, Intrade — with headquarters in Dublin, where the gambling laws are loose — has become the biggest success story among a new crop of prediction markets. The world’s largest steel maker, Arcelor Mittal, now runs an internal market allowing its executives to predict the price of steel. Best Buy has started a market for employees to guess which DVDs and video game consoles, among other products, will be popular. Google and Eli Lilly have similar markets. The idea is to let a company’s decision-makers benefit from the collective, if often hidden, knowledge of their employees.


I haven't participated in any "prediction market" but past statistical work tells me that within each such market, you'll find say half the participants whose individual track records will be higher than the average.  Thus, you can do better than the market average if you can predict the predictors: figure out which ones would drag down your average.

In other words, averaging opinions is a double-edged sword.  While some will provide "hidden" knowledge, others may provide "bad" information, which gets averaged too.

In substance, prediction markets are no different from so-called ensemble predictors which have been studied extensively in the statistical data mining area in recent years.  I am of the opinion that such things have proven more useful in increasing the stability of error rates than in improving the average error rates themselves.

Phil's take can be read here.

Reference: "Odds Are, They'll Know '08 Winner", New York Times, Feb 13 2007.

Nov 10, 2006

Calming the rip tide

Xan Gregg at Forth Go helpfully scraped the auto market share data off the NYT chart discussed here before.  He even created an improved chart based on histograms.

I have created another view of the data, using boxplots.  Tukey's boxplot is one of the most spectacular graphical inventions, as I have said before (see here, for example).  Its power is evident again for this data set.

Redo_autoshares_1 This chart is in fact two boxplots superimposed on the same surface.  I forgot to put on the legend: the green boxes represent U.S. market shares, and the blue boxes Europe shares.

The automakers are ordered by decreasing U.S. market shares (with apologies to European readers).

Lots of information can be immediately read off this chart:

  • The European market is much more fragmented than the U.S. market.
  • The Big 2 (GM, Ford) has had mixed fortunes over this period (as indicated by the large variance)
  • The Big 2 are competitive in Europe although they are definitely not dominant there
  • Several key players in Europe (Peugot, Renault, Fiat, BMW) have negligible shares in the U.S

Most importantly, there is little evidence that the U.S. market is "looking more like Europe".

One weakness of the above chart is the suppression of temporal information: there is no indication whether the recent shares are moving to the left or the right of the medians (center of each box). 

In the next chart, with the Europe data removed, I highlighted the data for the most recent 5 years in red.  I can make the general statement that there is a small movement towards less concentration and more parity in the U.S. market but one have to conclude that the U.S. market shares in 2000-2006 look more similar to the U.S. market shares in 1990-1999 than to Europe market shares.

Redo_autoshares2000

P.S. I added legends to the charts.


Oct 08, 2006

Higher, higher and higher

Nyt_shiller3

How high can it go?  This chart, sent in by Michael McCracken and attributed to Yale economist Shiller of "Irrational Exuberance" fame, very effectively poses this question.  The "hockey stick" on the right side of the chart really hits us like a gigantic question mark.

When we have good data, or are looking at the data from the right angle, the charting task is that much easier.

Michael especially likes background shading to highlight specific periods.   I'm a bit perplexed by the "World War I" label as that period does not appear remarkable to my eyes; it is also the only shaded reason that is not a boom period.

The text explains the need to remove "new construction" in order to study housing as an "investment over time".  As an outsider to the real estate industry, I find this definition arbitrary.  The 2001 data presumably would include the sales price of any house that was constructed from 2000 and back.  Why exclude only current-year construction?  Could a sale of a one-year-old property be considered "investment" and not "speculation"?

Reference: New York Times, Aug 26 2006.

Sep 12, 2006

Working with lines

Here's how a great idea can be made better.

Nyt_pharmacies

The unifying axis on the right hand side, described as "comparable percentage-change scales", is a great concept.  The data being plotted are the cumulative percent return for each stock from the start of 2006 to the day of publication.

Redo1apharmacies_1If the three lines are superimposed, we can see the relative performance throughout the year.  Within these three stocks, Walgreens has clearly underperformed until recently.  Also, plotting weekly rather than daily returns reduces clutter.  The only grid-line of importance is the 0% line, which is what is left.

In addition, the three other axes, depicting actual prices, are redundant; removing them significantly enhances readability. 

Some will insist that actual prices must be shown; the following includes key bits of data in a subtle way.

Redo2_pharmacy





Reference: "Drugstores are Looking More Like a Growth Story", New York Times, Sept 10 2006.

Sep 07, 2006

Rushing to judgement

Charting, since the great John Tukey spoke, has been recognized as a key subject of "exploratory" data analysis.  Starting with a battery of hypotheses, one can use charts to examine them, reject those not viable, and for the viable ones, search for the best perspective.

When the order is subverted, that is, when the conclusion is fixed before charts drawn, the result is often embarrassing.  This cited example is perhaps a result of such.

Nyt_knightridderThe header confidently announced: "since ... November 2005, most newspaper stocks have done poorly". 

Of six stocks shown, McClatchy really did poorly; Gannett and NYT weren't much better; however, Tribune appeared to be on the upswing, Dow Jones was also stable, and Knight Ridder was up.

Moreover, in order to fully appreciate an "industry challenged", one needs to establish comparability by including the performance of an index, say the S&P or the Dow.  When this is done, one realizes that the whole group of stocks have underperformed the general market (The Dow Jones average hovered between 0% and 10% during this period.)


Reference: "What-ifs of a Media Eclipse", New York Times, Aug 27, 2006

Aug 28, 2006

The dots don't connect

Nyt_stockownerNew York Times published a bar chart reminiscent of the one discussed here last week.  They added the 50% line and did not cluster the countries into groups of five. 

I like this chart for clarity and simplicity.  (Removing the decimal from the data would improve it.)  The U.S. and her special partner stand out as countries with the highest outside ownership of corporate shares. 

So far, so good.

Until I scanned the article itself, which startled and started with:

It turns out that most American investors are not xenophobic... Shareholders in the United States have been criticized as harboring "home bias" -- allocating far less to foreign stocks than they would if they did not let familiarity, patriotism and national loyalties stand in the way.

The dots don't connect, notwithstanding the academic references contained.  The chart shows how much U.S. stocks are owned by outsiders (which includes some foreigners but also many U.S. investors).  What has this to do with how much money U.S. investors spend on foreign stocks?

Even a good chart can't save a poor story.

Reference: "Investors without Borders", New York Times, Aug 27, 2006

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31