Ben Alamar reflects on the rise of data analytics in the NBA (link).
I like this passage very much, which really nails home the point that good analytics requires intuition:
The hours of waiting [during draft meetings] were often filled with watching film of prospects. It helped me refine my analysis, as I soaked up details from scouts that I never would have seen on my own. ("Rewind that. ... Did you see his foot placement there, getting ready for the rebound? That's NBA ready.") During one of these sessions, we were watching film of Syracuse point guard Jonny Flynn. I mentioned that, based on the rate at which he collected steals, he was likely a good defender. But one of the scouts explained that Flynn's steal total was likely higher than other point guards' because Syracuse played mostly zone defense, which allowed guards to attack the ball more. I checked that insight against the data and it seemed true, so I adjusted my defensive statistics to account for the dominant style of defense used by a player's team.
I'm glad to hear that the style of play is included in the models. I cringe every time I hear a (usually English) football (i.e. soccer) commentator claiming that a team "deserves" to be in the lead because it is dominating the time of possession when in fact, the other team is using a counter-attack strategy. When the other team ekes out a 1-0 victory on a sneak attack, the commentator loses his wit.
Alamar sees the next big challenge in NBA analytics as deriving value from the SportsVU data. What is SportsVU? Alamar tells us they installed cameras everywhere that "capture the coordinates of 10 players plus the ball 25 times every second." This is the typical "Big Data" scenario--data is collected without any design or any research question in mind. It raises a few intriguing questions:
- The granularity of such data (here it is 25 times a second, that is, to say, four-hundredth of a second apart) can be arbitrarily small. When have we reached the point of picking up just background noise?
- The very act of relating such data as "predictors" to outcomes such as scoring statistics presupposes the model in which the precise movements of the players or balls are correlated with those outcomes. Whether we like it or not, any resulting analysis will take on a causal interpretation--this is what separates trivia from an actionable insight. Is this type of predictor the most relevant to explaining outcomes? If not careful, we may just believe this story because that's the one we start with.
Comments