We might as well squeeze more juice out of the baseball data. So far, I have only indirectly touched upon Leonhardt's use of Pappas' efficiency metric, described thus:

Pappas noted that all teams must spend a minimum - now almost $9 million - on their payroll, because of the league's minimum player salary. A team full of players making the minimum would probably win about 30 percent of its games, roughly what the worst teams in baseball history have, he reasoned. Dividing any payroll above $9 million by any victories above the 30 percent threshold produces the cost-per-victory figure.

This calculation assumes that all teams spend an average of $185K per win for the first 48 or so wins (30% of 162 games). However, this figure is far from reality: a quick glance at the NYT table shows that for each win beyond the first 48, teams spend anywhere from $800K to $4 million.

The Pappas calculation severely over-estimates how far the first $185K can take a baseball team. Instead of using $9 million which is the league-mandated minimum spending, one can use the real payroll for the lowest spending team, which is about $30 million (Tampa Bay). As per this logic, *in order to get those 48 wins, no team has spent less than $30 million*. This assumes an average spending of $617K for the first 48 wins.

In the last post, we saw that West coast teams have poor ROI. Is this reflected by our efficiency metric? I return to the scatter plot, and now connect the dots to the origin. The slope of each line is a reflection of efficiency: the steeper the line, the more efficient the team. The long dotted line represents our base-line efficiency, i.e. 48 wins with $30 million. It is evident that no West coast teams beat the base-line, confirming our previous observation.

The short dotted line is Pappas base-line of 48 wins using $9 million. Its slope is much steeper than that of any real team, showing that he severely over-estimated team efficiency. In reality, if a team were to spend only $9 million (in today's terms), one doubts whether it can get 48 wins. (Because of the Yankee's extravagance, their efficiency line is by far the flattest and off the charts.)

If efficiency were the only criterion I worry about, this kind of plot would be less than ideal. It helps visually rank the teams but the reader cannot see the efficiency values. Also, it is not the length of the line but the slope/angle of the line which is proportional to efficiency.

The Pappas efficiency metric heavily rewards the first few wins above .300, i.e., every win just after 48 improves efficiency a

lot. Using the Pappas metric, KC looks worse than the Yankees but even one additional win gives a big efficiency jump. The Pappas metric isn't very robust to small differences in the number of wins:http://anonymous.coward.free.fr/temp/pappas.png

There are lots of alternatives. Here's one of them:

http://anonymous.coward.free.fr/temp/notpappas.png

Of course, this alternative also suffers a flaw: in reality, the Royals really

dosuck.Posted by: Robert | Sep 04, 2005 at 06:45 AM

Robert, your first graph would be great if we'd like to use scatter plots to divide the teams into segments using the Pappas metric.

How do you define your 2nd metric? I notice the lines are curved.

Posted by: Kaiser | Sep 13, 2005 at 12:00 AM

If you use the payroll/median payroll and win/loss ratio for the axes (instead of the team payroll and the projected wins), the lines are straight. However, the comparison between two different metrics is easier when the plot axes are the same and the contours are different than the other way 'round; try it the other way if you'd like to see why.

Using scatterplots to show X, Y, and f(X,Y) is a pretty handy technique that I think is underutilized. Here are a couple of other (sports-related) examples:

http://anonymous.coward.free.fr/rbr/tdf05-1.png

or

http://anonymous.coward.free.fr/rbr/tdf04-bmi.png

BTW, the baseball plots let you see payroll, wins, team name, league, and efficiency (for a particular metric).

Posted by: Robert | Sep 13, 2005 at 03:39 AM

Should've clarified for those who have a parochial view of sport that those are from the Tour de France. Here's yet another damn example of using a scatterplot to show X, Y, and f(X,Y):

http://anonymous.coward.free.fr/rbr/cols05-tdf.png

Posted by: Robert | Sep 13, 2005 at 03:59 AM