Baseball ROI 2: scatter plots
Aug 30, 2005
What more can one do with scatter plots? Much more, it turns out. In the last post, I compared middle-market baseball teams to big/small spending teams. There are many other ways to group the 30 teams, for example, by league (American, National), by region (West, Central, East) and by division (NL West, AL East, etc.).
When presented in a table, this information is hidden. On a scatter plot, such comparisons are easily visualized by judicious use of colors and/or labels.
The same pattern appears in both leagues although the payroll extremes occur in the American League (top right). The payroll disparity is widest among East coast teams and smallest among West cost teams (bottom left).
While the overall pattern, at top left, is one of higher payroll, more victories, the bottom left graph shows that this overall pattern is only observed among teams in Central. The winning percentages of East and West coast teams appear rather flat across a wide range of payroll.
The graph by division (bottom right) further muddies the picture. NL East teams all have above .500 records regardless of whether their payroll is above or below median. The opposite is true for NL West teams. AL West has one team in each of the four quadrants, and to a lesser extent, the same for AL Central and AL East. Thus, the strongest evidence of a link between payroll and winningness is among NL Central teams.
In the following set of graphs, I extracted the middle-market teams, and then plotted each region's teams on a separate graph, facilitating comparison by region. The scales are standardized as in the last post.
With few samples in each group, it is hard to make general statements but overall, the link between payroll and winning percentage is weak among the middle-market teams, regardless of where they are located.
If we remove Colorado and LA Angels from the West coast teams chart (top right), we will uncover disturbing news! The other seven teams together paint a bleak picture of West coast management: the more they spend, the more they lose.
WIth just a quick read I think you also point out an interesting analysis technique , breaking your data into different groupings. This is one of the reasons I love the trellis plots in S+ and R. Sometimes using all your data hides patterns in subsets.
Posted by: Steve Citron-Pousty | Aug 31, 2005 at 12:33 PM