Relative relative indices
Baseball ROI 2: scatter plots

Baseball ROI: tables or graphs?

David Leonhardt re-opened the debate about whether high-spending baseball teams (like the Yankees) are winners or losers.  According to his application of an idea from Doug Pappas, George surely fools his investors!  Accompanying his article was a table of numbers, of which I clipped the top third:
Nytbballtable_1As tables go, this one is fundamentally sound, teams sorted by "cost per victory" which was the point David wanted to make.

If some readers find this table hard to swallow, they probably have wandered off, trying to make sense of the payroll and winning percentage columns; or perhaps they got dizzy trying to get their heads around 1,133,807 versus 1,225,575.  Precision is a great scientific virtue but rarely makes a good graphic guideline.

This set of data, essentially a bi-variate series, gives me yet another opportunity to discuss the versatile scatter plot.  Here is the basic design, with winning % on the y-axis and payroll on the x-axis.  Contrary to the article's conclusion, there appears to be a general association between payroll and winningness.  The dotted lines are median payroll (US$ 63 million) and median winning % (0.500) respectively so that half the teams fall on either side of each line.  I have removed the Yankees since its spending far outstripped every other team (will return to them later).


We can take this design a step further by standardizing both variables: in the new graph, the scales are in units of standard deviations (s.d.) so that 0 is the mean payroll and +1 is payroll that is one s.d. above the mean and so on.  Observe that the Yankees payroll of US$ 206 million is four s.d. above the mean payroll.


Notice the rectangle above.  These are what I call "middle market teams", their payroll within 1 s.d. of the mean, ranging from US$ 39 to 107 million.  Plotting them separately from the Big/Small Spenders gives us a much richer picture of what is occurrring in baseball today.


On the left, the 25 middle market teams are almost equally distributed among the four quadrants (about 6-7 teams in each), showing possibly payroll having nothing to do with winning.  However, extravagant teams (Yankees, Red Sox) always are winners and miserly teams (Pittsburgh, Kansas City, Tampa Bay) always are losers, the inevitability starkly revealed on the right.  (Admittedly, these sample sizes are small.)

Scatter plots reveal many more insights than tables of numbers.  Any table must be sorted in one given dimension, and such ordering causes difficulty in understanding other variables listed in the same table.  In a scatter plot, both variables are accorded equal status and the reader decides where to place her attention.

Further, a third variable can be layered on top of a scatter plot.  In the next post, I will address the question of whether East Coast or West Coast management have done better with their money.  What do you think the data will show?

Reference: "Passing on Blue-Chip Players can Pay Off", New York Times, Aug 28, 2005.


Mike Anderson

VERY nice. I'm going to assign this as reading to my freshmen.

I especially liked the "big picture" graphic with the middle market teams boxed*; this very quickly draws attention to Chicago and St Louis, who have apparently figured out how to maximize ROI.

*What graphics package are you using? Either it's pretty versatile, or you're doing some very clever tricks to get the calibration lines and boxes.


Looks to me those graphics were generated with R (the dollar signs in the variable names give it away).

You are doing great work on this blog!


John is right. R is an amazing tool and not just for graphing. It requires a basic understanding of programming. Almost anything you can visualize in your mind, you can create using R. I will likely write a post on R in the future.

The link where the software is freely downloadable is listed under my "Sites of Interest".

The comments to this entry are closed.