Yesterday, I had the honor of being one of the judges at the NBA Hackathon. This is the second edition of the Hackathon, organized by the NBA League Office's analytics department in New York. Here is Director of Basketball Analytics, Jason Rosenfeld, speaking to the crowd:
The event was a huge draw - lots of mostly young basketball enthusiasts testing their hands at manipulating and analyzing data to solve interesting problems. I heard there were over 50 teams who showed up on "game day." Hundreds more applicants did not get "drafted." Many competitors came from out of town - amongst the finalists, there was a team from Toronto and one from Palo Alto.
The competition was divided into two tracks: basketball analytics, and business analytics. Those in the basketball track were challenged with problems of interest to coaches and managers. For example, they are asked to suggest a rule change that might increase excitement in the game, and support that recommendation using the voluminous spatial data. Some of these problems are hard: one involves projecting shot selection ten years out - surely fans want to know if the craze over 3-pointers will last. Nate Silver was one of the judges for the basketball analytics competition.
I was part of the business analytics judging panel, along with the fine folks shown below:
The business problems are challenging as well, and really tested the competitors' judgment, as the problems are open-ended and subjective. Technical skills are also required, as very wide-ranging datasets are made available. One problem asks contestants to combine a wide number of datasets to derive a holistic way to measure "entertainment value" of a game. The other problem is even more open: do something useful and interesting with our customer files.
I visited the venue the night before, when the teams were busy digging into the data. See the energy in the room here:
The competitors are given 24 hours to work on the datasets. This time includes making a presentation to showcase what they have found. They are not allowed to utilize old code. I overheard several conversations between contestants and the coaches - it appeared that the datasets are in a relatively raw state, meaning quite a bit of time would have been spent organizing, exploring, cleaning and processing the data.
One of the finalists in the business competition started their presentation, telling the judges they spent 12 hours processing their datasets. It does often seem like as analysts, we are fighting with our data.
This team from Toronto wrestled with the various sets of customer-indiced data, and came up with a customer segmentation scheme. They utilized a variety of advanced modeling techniques.
The other two finalists in the business competition tackled the same problem: how to measure entertainment value of a game. Their approaches were broadly similar, with each team deploying a hierarchy of regression models. Each model measures a particular contributor to entertainment value, and contains a number of indicators to predict the contribution.
Pictured below is one of the finalists, who deployed Lasso regression, a modern technique to select a subset of important factors from a large number of possibilities. This team has a nice handle on the methods, and notably, was the only team that presented error bars, showing the degree of uncertainty in their results.
The winning team in the business competition went a couple of steps beyond. First, they turned in a visual interface to a decision-making tool that scores every game according to their definition of entertainment value. I surmise that they also expressed these scores in a relative way, because some of their charts show positive and negative values. Second, this team from Princeton realized the importance of tying all their regression models together into a composite score. They even allow the decision makers to shift the component weights around. Congratulations to Data Buckets! Here is the pair presenting their decision-making tool:
Mark Tatum, deputy commissioner of the NBA League Office, presented the award to Team Data Buckets:
These two are also bloggers. Look here.
After much deliberation, the basketball analytics judges liked the team representing the Stanford Sports Analytics Club.
These guys tackled the very complicated problem of forecasting future trends in shot selection, using historical data.
For many, maybe most, of the participants, this was their first exposure to real-world datasets, and a short time window to deliver an end-product. Also, they must have learned quite a bit about collaboration.
The organizers should be congratulated for putting together a smoothly-run event. When you host a hackathon, you have to be around throughout the night as well. Also, the analytics department staff kindly simplified the lives of us judges by performing the first round of selection overnight.
***
Last but not least, I like to present the unofficial Best Data Graphics Award to the team known as Quire Sultans. They were a finalist in the basketball analytics contest. I am impressed with this display:
This team presented a new metric using data on passing. The three charts are linked. The first one shows passer-passee data within a specific game; the second shows locations on the court for which passes have more favorable outcomes; the third chart measures players' over/under performance against a model.
There were quite a few graphics presented at the competition. This is one of the few in which the labels were carefully chosen and easily understood, without requiring in-depth knowledge about their analysis.