Jan 31, 2008

Jittering lines

A reader alerted me to this NYT chart a few weeks back.  The chart plots daily changes in stock index prices (gray lines) and yearly changes (color blocks). 

Nyt_volatility

The blue blocks represent bad down years but notice that the daily changes during many of those periods give no such impression.  Nyt_volatility2_2In fact, the gray lines are quite equally balanced on both sides of 0, and yet the annual tallies swing from positive to negative quite frequently.  It is by no means true that one exceptional down day predicts a down year.

The problem arises from cramming too much data into too small a space.  We can't judge the density of the lines on paper and so can't judge whether there were more up lines than down lines.

This issue is not dissimilar to the jittering question when used with large data sets.

Source: "The Pulse of Uncertainty", New York Times, Jan 4 2008.

 

Dec 02, 2007

Live dynamic graphics

In the second interesting item of the week, I return to the fabulous Google Finance chart, which shows the distribution of stock market returns by sector.  I wrote about it twice (here and here).  In the original post, I saluted the engineers for figuring out the formidable technical issues of turning a live dynamic data stream into a live dynamic graphic but didn't go into details.  (Trust me.)

Goog_oops The other night, this chart popped up on my browser.

Oops.

If someone kept track of each time such a mishap showed up, the tally would probably be 1-5% of the time.

The triple challenge of generating this graphic is the volume of data that needs to be processed, the velocity at which it changes, and the flicker of time from input to output, probably not more than a few minutes. The analysis and charting must be maintained continuously during market hours.  For any such projects, the thing to manage is the error rate, and one should be totally thrilled if it's in the range Google engineers have achieved.

Nov 27, 2007

The punch line

Mike K submitted this great entry months ago.  It's a map depicting stock market correction across the globe during the summer.  You have to click on the link to the WSJ website in order to see the interactive element.

Wsj_correction

Here are Mike's comments and mine:
Why it's bad:
First, to see the detail you have to click on the countries one by one. Hard to do a comparison of two countries. This makes it close to FlashJunk.

The color scheme is supposed to help but:

Second, the colors are too close together to allow easy comparisons of, say, Canada and Australia.

In addition, the binning of the colors is uneven and oddly chosen.  In the middle of the scale, each color shift represents 1% but at the edge, it is 5%, or more.

Third, area of these countries, or their geographic location, isn't really that relevant. Market cap might be. Then tiny-but-richly-capitalistic Netherlands wouldn't have to be shown in the middle of the Atlantic, as if the dikes had all burst and Amsterdam had floated out to sea.
Indeed, it begs the question: what were the gold dots suppose to signify? (Hint: it's not location.)
Fourth, why the selectivity? There's stock markets in Turkey, and in Russia, and in Ireland and in Thailand. (Oh, wait, they show the one in Thailand -- except they put it in Myanmar instead.)

Finally, the chart lacks a punch line. 

In the junkart version, I want to test the hypothesis of a global contagion so I plot the data in order of closing times of individual stock markets.  (I just guessed the closing times based on the map.)  Not much here though.

Redo_correction

Source: "Global Correction", Wall Street Journal, August 2007.

Oct 17, 2007

Points of comparison

Econ_mortgage In light of the current housing crisis, arising from mortgage defaults, I pulled this graphic from a Jan 2007 opinion piece that plotted historical default rates of mortgages.  Notice the high degree of stretching on the vertical axis that exaggerates the volatility: essentially, the annual delinquency rate ranged from 1.75% to 2.65% during the last six years or so.  One might be forgiven to think that a 2% default rate is quite acceptable.

Nyt_mortgage_2 Compare the above chart to the pair that showed up in the NYT in Oct 2007 (see right).  The default rates here are in the 10-20% range, very alarming indeed.

The two graphics illustrate a key issue of "aggregation" in statistical analysis.  The first graphic is super-aggregated: all types of mortgages of all ages are put together to calculate each year's default rate.  The second graphic hones in on subprime mortgages only.

More importantly, the second graphic presents data in "vintages".  Each line represents loans originated during a particular year (a "vintage").  This establishes comparability.  On the first chart, each point in time represents the default rate of mortgages averaged over all ages (some loans may be only a few months old; others may be 15 years old).  Since the default rate is much higher for very young mortgages than for older mortgages, such averaging hides crucial information.

Overall, the NYT graphic very effectively conveys the alarming trend of new mortgages performing much worse, especially those originated in 2007.

Redo_mortgage It can benefit from two slight edits: adding a few more years, and using vertical lines (the most critical comparisons are default rates for loans of a given age!)  Something like this...


Sources: "As Defaults Rise, Washington Worries", New York Times, Oct 16 2007; "Mounting Mortgage Credit Problems", economy.com, Jan 23 2007.

May 09, 2007

A picture is worth...

... how much?

Ii_buyoutfirms

Source: "Banking the Buyout Forms", Institutional Investor, April 2007.

May 06, 2007

Visualizing sensitivity

A reader wrote:

I'm a loyal reader who hopes you'll indulge him in just one or two questions.

In finance (valuation, specifically), we often create two-way sensitivity tables. Unfortunately, a three-way sensitivity table is what's most often called for. Of course, we work around this by producing multiple two-way tables.

Now, obviously, it's pretty hard to build  three-way table or chart in two dimensions, and the use-bigger-bubbles method doesn't really make sense in this kind of application-- but can you conceive of a good way to present the data in any other form?

3waydata_2 Like he indicated, we typically see multiple two-way data tables for such data.  The virtue of this approach is that the data is exceptionally well-organized; it's great for looking up the outcome given the three dimensions (I called them Red, Green and Blue to protect the innocent.)

Further, starting from a baseline i.e. a particular cell in the table, it's easy to move our eyes up, down or jump tables to observe the impact of changing dimensions (so-called sensitivity analysis).

These data tables facilitates "local" sensitivity analysis but obscure "global" sensitivity: staring at those numbers, we feel lost in the trees and can't see the forest.  What's the effect of increasing Green on average?  What's the effect of increasing Green while decreasing Blue? etc. etc.

3waygraph The junkart construct (right) is made to address these questions.  The black stripes establish the baseline, the overall range of values.  Then, if interested in the effect of Red = 0.11, we can compare those red stripes with the black.  Since the spread is wide, we note that Red = 0.11 is not a strong indicator of value, and to the extent it is, it points to lesser values.

What about Red = 0.11 and Green = 2?  Now, we focus on the first red stripes and the first green stripes.  We note that the overlapping region (which is where both conditions apply) is highly concentrated to the low end of value range.  Thus, we conclude that under those conditions, value is low (below 10,000) and further, that it is low primarily because Green = 2.

On and on for any one-way, two-way or three-way effects.

Although it's not the purpose of the chart, local sensitivity can also be observed.  For example, the highest value comes from Red = 0.09, Green = 16 and Blue = 0.30.  What if Blue decreases to 0.28?  We start on the Blue = 0.28 layer; going from right to left, as we see a blue stripe, we scan vertically to find the corresponding red and green stripes; the 3rd stripe from the right, we find the scenario of interest.  Such analysis would benefit from adding an interactive vertical guiding line.

Do you prefer 3-D plots?  Contour plots? Feel free to share your ideas!

Apr 12, 2007

Peripherals 2

In terms of interactive charting, Google Finance did much more than hide the legend.  In their main stock price chart, they used a number of neat features.

Google_ahm1

This chart effectively conveys a huge amount of information in a small space.  The bottom strip which shows relative prices for the past two years provides context to interpret the five-day movement shown in the main chart area.  I prefer to see a scale on the bottom strip as well. 

The sliding scrollbar can be dragged to show historical data.  Besides, the width of the window shown in the main area can be controlled.  For instance:

Google_ahm2

Without any effort, we are now looking at a 3-month chart for Q2 2006.  Notice the summary statistic on the top right corner also morphed.  The axis scale changed, and it never did start from zero to begin with.  (This shortcoming is alleviated by the profile chart in the bottom strip.)

Further, by placing the cursor in the chart area, we can highlight a particular day: a dot appeared on the price curve, the volume on that day was highlighted, and the text on the top right switched.  That text is what we typically place inside the chart area as a "data label".  The effect of moving it to the corner is similar to hiding the legend: it makes the graph more legible and provides space for longer descriptions.  As we move the cursor from left to right, the graph dynamically adapts.  Marvellous!

Google_ahm3

It may not be obvious the amount of data processing that has to take place to implement these sorts of features. I don't have space to address the data issue but maybe some of our readers can comment on it. 

Apr 08, 2007

Peripherals 1

Like any technology, charts also come with peripherals: I'm talking about legends, data labels, grid-lines and so on.  These things typically give us the most trouble, especially with complex data sets.  The analogy is apt: one may feel inextricably knotted up like bunches of cords and wires.

Interactive graphics is a particularly elegant solution to this problem, and Google Finance has done a fantastic job leading the way.  One trick is to show the legend only when the user asks for it. 
Google_sectorsum_lgUsing bar charts (on the left), Google summarizes neatly the performance of stocks within each industry sector.  The bar chart gives a sense of the dispersion which adds to the average returns printed next to them.  For example, most sectors gained on average but then about 30% of the individual stocks in most sectors actually declined on that day.  So the fact that technology stocks gained 0.48% on average doesn't necessarily mean that the two tech stocks you own gained 0.48% or gained at all.

Typically, we would put a legend on the side or at the bottom of the chart, which all be told, is an ugly duckling next to a well-executed chart.  Here, the legend is hidden behind the "What's this?" link.  The side benefit is that the legend can be as verbose as needed since it doesn't interfere with the chart.

There are a few minor things to consider:

  • "What's this?" is not very informative: Why not call it a "legend" or "key"?
  • The graph designer seems to think that the most important information sought by readers was the extremes, i.e. the percentage of stocks that gained/lost more than 2%.  By darkening the sides of the bar, it draws attention away from the middle which is the boundary between the gainers and the losers.  I'd like to see that boundary delineated.
  • Similar to the above point, I'd sketch out a version which aligns the gainer/loser boundary to the middle so it's easy to see the balance between gainers and losers.  This version however would require more space
  • I'd provide sorting by average return, and by percentage of gainers

Mar 27, 2007

Illusory disparity

The WSJ published a chart with the cheeky title of "Rich Get Richer" (reminiscent of the Economist).  The underlying data concerned one-, three- and ten-year returns for the buyout fund category.  For each return class, the overall mean and the means for the top and bottom 25% funds were depicted.

I won't go into the relevance of the title as I simply could not figure out how it connected with the data.  The following shows the original chart side by side with the junkart version.

Redo_richgetricher

Improvements include:

  • Lines show the comparisons with a minimum of fuss compared with colored bars
  • The overall mean return is placed in the middle of each line segment where it belongs, instead of being the first column
  • The axis label, "annualized return", tells readers what is the performance measure
  • Adding the word "funds" to "top quartile" and "bottom quartile" removes the possible confusion that those represent individual returns of the funds ranked at 25th and 75th percentiles, rather than the average returns of the bottom 25% and top 25% of funds
  • The linear construct paints the correct picture that individual fund returns fall into a continuum

(Thanks to my students for some of these points.)

Reference: Wall Street Journal, Mar 3-4 2007.

Mar 21, 2007

Dot com bubbles

Web_dotcombubbles Thanks to Dustin J for the pointer as well as the title of this post.  Dotcom bubbles is the most appropriate name for this overblown chart (featured as the "chart of the day" here).

The chart has no title or axis labels so only the diligent reader will figure out that the data consist of acquisition value of several high-profile Internet companies in the past three years.

There are less data than it seems.  Both the heights and the areas of the bubbles indicate the same thing, the deal values.  If we are supposed to see a trend, we are not finding it.

Most of these deals are not directly comparable anyway.  Webex and Ironport are infrastructure type companies with real business models.  Skype is a phone service.  Ask Jeeves is not a leader in its own space. Myspace and YouTube are traffic sites.

Reference: "Chart of the Day: Web deals", Valleywag, Mar 15 2007.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Recent Comments

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31