Apr 12, 2007

Peripherals 2

In terms of interactive charting, Google Finance did much more than hide the legend.  In their main stock price chart, they used a number of neat features.

Google_ahm1

This chart effectively conveys a huge amount of information in a small space.  The bottom strip which shows relative prices for the past two years provides context to interpret the five-day movement shown in the main chart area.  I prefer to see a scale on the bottom strip as well. 

The sliding scrollbar can be dragged to show historical data.  Besides, the width of the window shown in the main area can be controlled.  For instance:

Google_ahm2

Without any effort, we are now looking at a 3-month chart for Q2 2006.  Notice the summary statistic on the top right corner also morphed.  The axis scale changed, and it never did start from zero to begin with.  (This shortcoming is alleviated by the profile chart in the bottom strip.)

Further, by placing the cursor in the chart area, we can highlight a particular day: a dot appeared on the price curve, the volume on that day was highlighted, and the text on the top right switched.  That text is what we typically place inside the chart area as a "data label".  The effect of moving it to the corner is similar to hiding the legend: it makes the graph more legible and provides space for longer descriptions.  As we move the cursor from left to right, the graph dynamically adapts.  Marvellous!

Google_ahm3

It may not be obvious the amount of data processing that has to take place to implement these sorts of features. I don't have space to address the data issue but maybe some of our readers can comment on it. 

Mar 27, 2007

Illusory disparity

The WSJ published a chart with the cheeky title of "Rich Get Richer" (reminiscent of the Economist).  The underlying data concerned one-, three- and ten-year returns for the buyout fund category.  For each return class, the overall mean and the means for the top and bottom 25% funds were depicted.

I won't go into the relevance of the title as I simply could not figure out how it connected with the data.  The following shows the original chart side by side with the junkart version.

Redo_richgetricher

Improvements include:

  • Lines show the comparisons with a minimum of fuss compared with colored bars
  • The overall mean return is placed in the middle of each line segment where it belongs, instead of being the first column
  • The axis label, "annualized return", tells readers what is the performance measure
  • Adding the word "funds" to "top quartile" and "bottom quartile" removes the possible confusion that those represent individual returns of the funds ranked at 25th and 75th percentiles, rather than the average returns of the bottom 25% and top 25% of funds
  • The linear construct paints the correct picture that individual fund returns fall into a continuum

(Thanks to my students for some of these points.)

Reference: Wall Street Journal, Mar 3-4 2007.

Dec 26, 2006

End of year effect?

Nyt_babies2 I agree with JF who suggested that this chart was mind-boggling.  The chart accompanied a somewhat diffuse NYT article postulating that tax break or shifting medical practice or less apprehension about tired nurses or added labor-inducing stress from visiting relatives may have something to do with more babies being born in December, particularly at month's end.

This chart presumably shows the "spike" in December births, or more precisely, the shift of January births into December.  The trouble with it is its lack of comparability.  We need to compare the 2002-3 trend to some prior year to see the shift.

Even then, we would have seen only one data point.  So it would have been better to plot multiple years.

Finally, after reading the article, I cannot discern the importance of Monday and Friday.  The yellow-pink coloring has not improved my comprehension of the data; it leaves me with more questions than before.

Reference: "To-Do Lists: Wrap Gifts, Have Baby", New York Times, Dec 20 2006.



PS. Please now visit Jon's response.  Kudos for digging out the historical data series and a stellar analysis!

Dec 15, 2006

Emergent patterns

It's always a pleasure to read blow-by-blow accounts of how charts were constructed.  The piece on time-travel maps was instructive.  Similarly in the previous post, I quoted the following:

It’s easier to answer this question if you leave out the six states that didn’t elect any Republicans in 2000; after all, they didn’t have any to throw out. If you also remove New Hampshire and South Dakota, where the percentage of Republicans elected dropped to 0 from 100 — New Hampshire only has two seats in the House and South Dakota has one — a pattern starts to appear.

At first sight, this appears as a case of removing outliers, which many statisticians recommend.  Except that the data omitted were not outliers.  Indeed, when both x- and y-variables are bounded (between 0% and 100% share of the House seats; between -100% and +100% change in share), there can be no extreme values.

In effect, when the author eliminated those eight points, he followed the "emergent pattern" theory, by which I mean the notion of removing data until a pattern "emerges".  (By the way, emergence is now a science, as expounded here.)  If enough data is removed, one can produce any pattern as one pleases.  One can find subsets of data to support a hypothesis of positive linear, flat linear or quadratic, as shown below.

Redoelectiond

Focusing now on the full data set on the upper left corner, one is hard pressed to conclude that a positive correlation exists between the two variables. In particular, most states experienced no changes in the share of House seats, and in these states, the income growth ranged from under 20% to over 40%, which is pretty much the extent of variability across the full data set.

Dec 13, 2006

The trouble with percentages

In the aftermath of the Democratic victory in the 2006 mid-term election, the NYT published a column floating the idea that "it was the economy, stupid".  For statistics buffs, this column provides much food for thought. 

Suffice it to say, if you were my student, you would not want to hand this in as an essay.  To the author's credit, he did backload the article with lots of disclaimers.

The key thesis of the piece is:

if your state wasn’t among the best economic performers in the last six years, judged by the growth of personal income, it appears that you were three times as likely to vote to throw the bums out.

Redo_election06b_1 (We'll just assume he didn't mean "you" but "your state".) To help us understand the author's logic, I created a scatter plot, relating the change in state average personal income (2000-2006) to the change in percent of Republican seats.

He first segmented the states into two groups: the red dots had the top 10 income growth rates; the blue dots were the remaining states.  Then for each group, he computed the average drop in % Republican.  For the reds, it was 2%; for the blues, it was 7%.  (These levels are indicated by the horizontal lines.  My data are slightly different from his.)  Case proven -- with disclaimers.

Some of you are already counting the dots.  If you only find 42, you'd have counted correctly.  The following explanation provided by the analyst is classic:

It’s easier to answer this question if you leave out the six states that didn’t elect any Republicans in 2000; after all, they didn’t have any to throw out. If you also remove New Hampshire and South Dakota, where the percentage of Republicans elected dropped to 0 from 100 — New Hampshire only has two seats in the House and South Dakota has one — a pattern starts to appear.

I will leave the emergent pattern thesis to a future post.  For this post, I am interested in the trouble with percentages.  He is right to point out that for those 100% Blue states, the change in %Republican is constrained to be positive, from 0% up to 100%.  For most other states, the change can be positive or negative.

Good observation but wrong remedy -- those six states with 0% Republicans in 2000 are not special; removing them from the analysis is wrong-headed.  What about those states with 100% Republicans in 2000?  There, the change in %Republican can only be 0% or negative.  In fact, the possible range for the change in seats for each state is different, and it depends on the Republican proportion in 2000!  For example, if in 2000 the Republicans held 30% of the seats, then in 2006, the change must be between -30% and +70%.

The situation is worse: the range of possible values also depends on the number of seats in each state.  The fewer total seats there are, the fewer possible values that can be taken.  As the author notes, with only 1 seat, you either lose it, gain it or retain it, so that the change will be either -100%, +100% or 0%.  No other values are possible!

Both the above troubles arise because we use percentages to describe something discrete (number of seats).  This is a difficult problem and I don't know of a general solution. Redo_election06c However, in this example, because the change in seats is small across all states, regardless of the total number involved, I recommend that we avoid percentages and stick with positive, zero and negative changes.

The boxplot shows that there is little correlation between income growth and whether Republicans would win or lose House seats in 2006.  Here, the states are divided into three groups depending on whether the Republicans gained, lost or retained seats in the 2006 mid-term election.  The median income growth are similar in all three groups and the boxes overlap heavily.

Reference: "Maybe You Did Vote Your Pocketbook", New York Times, Nov 12 2006.

PS. If you like this post, consider sending me a holiday gift.

 


Oct 20, 2006

The elusive catchup

CommoditiesThanks to Michael S. for sending in this chart from the economists at IMF (via this blog).

At its heart, this is a scatter plot that displays the correlation between a country's development stage (indicated by its PPP GDP) and the importance of the industrial sector to its economy.

On top of that, the chart adds a third dimension of time by linking the dots together with lines.  The lines trace the evolution in each country or set of countries.  Some countries (mostly developed nations) have a clear trend; others exhibit choppy curves which imply fluctuating economic conditions.

We have created this type of chart when discussing the fabulous Gapminder site.

The shading in the chart is supposed to draw attention to an inflection point around $15,000 per capita GDP, wherefrom the industrial sector starts to decline in importance.

In my view, that conclusion is forced because Korea is the only curve displayed on the chart that bridged the $15,000 divide.  Thus, one can say there exists only one data point supporting this hypothesis.

However, one aspect of this chart jumps out at us, which is the chasm between developed and developing countries, right at the $15,000 divide.   On the right side, the rich gets richer in a relatively steady fashion.  On the left side, the poor remains poor.  These nascent economies suffer from a great deal of volatility.  What's worse, the slopes are much sharper on the left than on the right, meaning that the gains in GDP are much smaller on the left of the divide.  Even more troubling are the cases of Brazil and Mexico which seemed to have endured a decline in the industrial sector without much gain in GDP.

The only bright spot is Korea.  (And China is the outlier.)


 

Oct 11, 2006

Arming the competition

At the TCS blog, Tim Worstall attacked a chart comparing global levels of income inequity, originally published by the Economic Policy Institute.  His post is here.  Tim claimed that this chart proved precisely the opposite of what the EPI intended it to show, that is, that the chart showed that "the poor in America have exactly the same standard of living as the poor in Finland (and Sweden)", two countries which he derided as "redistributionist paradises".  From this, Tim concluded that the U.S. is doing enough for the poor.

Tcs_incomeStephen C., who sent in this chart, was very confused by the length of the bars: left of the divider, the larger the income index, the shorter the bar; right of the divider, the larger the income index, the longer the bar.

For the EPI, this is a case of arming the competition.  Echoing Robert's comment from yesterday, this is one chart that opines but should have murmurred. 

The chart is a very convoluted way to study the idea of income inequality.  The first bar states that the 90th percentile income in Finland is 1.11 times the median U.S. income, after adjusting for PPP.  Notice the simultaneous change in percentile and country, which complicates our understanding of the difference.

The median income is perhaps the simplest (not most informative) measure of income equality.  In the EPI chart, the edges of each bar describe the 10th and 90th percentile income in a country.  We only know 80% of the population lie within each bar but nothing about how they are distributed.

Redo_income_1In the revised chart, I plotted another popular measure of income equality, the ratio of 90th percentile to 10th percentile (since the data is readily available from the EPI chart).  It's clear that inequality is highest in the English-speaking Western world where the top earners get 4-6 times more than the bottom earners.

This income ratio is computed for each country, and can be used to compare across countries without resorting to another index. 

Reference: "America: More Like Sweden Than You Think", TCS Daily, Aug 26 2006.

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31