Revisiting the home run data
Jul 21, 2014
Note to New York metro readers: I'm an invited speaker at NYU's "Art and Science of Brand Storytelling" summer course which starts tomorrow. I will be speaking on Thursday, 12-1 pm. You can still register here.
***
The home run data set, compiled by ESPN and visualized by Mode Analytics, is pretty rich. I took a quick look at one aspect of the data. The question I ask is what differences exist among the 10 hitters that are highlighted in the previous visualization. (I am not quite sure how those 10 were picked because they are not the Top 10 home run hitters in the dataset for the current season.)
The following chart focuses on two metrics: the total number of home runs by this point in the season; and the "true" distances of those home runs. I split the data by whether the home run was hit on a home field or an away stadium, on the hunch that we'd need to correct for such differences.
The hitters are sorted by total number of home runs. Because I am using a single season, my chart doesn't suffer from a cohort bias. If you go back to the original visualization, it is clear that some of these hitters are veterans with many seasons of baseball in them while others are newbies. This cohort bias explains the difference in dot densities of those plots.
Having not been following baseball recently, I don't know many of these names on the list. I have to look up Todd Frazier - does he play in a hitter-friendly ballpark? His home to away ratio is massive. Frazier plays for Cincinnati, at the Great American Ballpark. That ballpark has the third highest number of home runs hit of all ballparks this season although up till now, opponents have hit more home runs there than home players. For reference, Troy Tulowitzki's home field is Colorado's Coors Field, which is hitter's paradise. Giancarlo Stanton, who also hits quite a few more home runs at home, plays for Miami at Marlins Park, which is below the median in terms of home run production; thus his achievement is probably the most impressive amongst those three.
Josh Donaldson is the odd man out, as he has hit more away home runs than home runs at home. His O.co Coliseum is middle-of-the-road in terms of home runs.
In terms of how far the home runs travel (bottom part of the chart), there are some interesting tidbits. Brian Dozier's home runs are generally the shortest, regardless of home or away. Yasiel Puig and Giancarlo Stanton generate deep home runs. Adam Jones Josh Donaldson, and Yoenis Cespedes have hit the ball quite a bit deeper away from home. Giancarlo Stanton is one of the few who has hit the home-run ball deeper at his home stadium.
The baseball season is still young, and the sample sizes at the individual hitter's level are small (~15-30 total), thus the observed differences at the home/away level are mostly statistically insignificant.
The prior post on the original graphic can be found here.
FYI: The 10 hitters were the participants in this year's Home Run Derby ahead of the All-Star Game.
Posted by: Vince | Jul 21, 2014 at 09:07 AM