After the NBA Hackathon (see report here), I caught up with the winning team in the business analytics competition, DataBucket, composed of Barbara Zhan and Harold Li.
Junkcharts: Congratulations for winning the business analytics competition at the NBA Hackathon. As a judge, I was very impressed by how much work you were able to do in 24 hours. Did you sleep or did you work all the way through?
DataBucket: We slept for around 5 hours in the early mornings, but also took breaks every few hours just to relax our minds and recalibrate.
JC: The problem you chose to tackle is to define "entertainment value" for any NBA game. That's a huge problem to tackle in 24 hours. How did you allocate your time?
DB: We spent the first few hours planning our course of action, and really debating how to evaluate "entertainment value." Without a good metric, any sort of analysis would be fruitless. We also decided on our methodology (a time-series regression approach) and the features we wanted for our model.
Afterwards, we divided and conquered, cleaning / scraping the various datasets to get the variables that we wanted. Once we had a clean dataset, we ran regressions and played with features to get the most accurate and most intuitive results.
Once we were confident with our model, we spent time building out the Tableau dashboard that visualized those entertainment values. It was important for us to come up with a tool that was engaging, interactive, flexible and informative, so we spent significant time designing our visualization.
JC: How did you allocate work between the two team members?
DB: It was a joint effort! We came up with the initial plan after an hour or so of joint brainstorming. Barbara took the lead on the feature engineering / data modeling sections, coming from a quantitative hedge fund background where she knew a ton about regressions and the assumptions behind them, but both of us were highly involved in data cleaning and modeling. Harold took the lead on the visualization / presentation component, since he comes from an analytics background where storytelling and communicating results in a business context is vital. He created a Tableau dashboard that showcased our resulting entertainment metric, which updated over time, and lent a crucial "cool" factor to our presentation.
JC: Tell me about your backgrounds.
DB: We both majored in Operations Research and Financial Engineering at Princeton. Harold is a data scientist at Blue Apron, and previously worked at Goldman Sachs as a quantitative strategist. Barbara is a quantitative researcher at Two Sigma.
JC: I heard you guys have a blog called DataBucket. What's the origin of the name?
DB: When we were both at Princeton, we thought it would be fun to use our data science skills to answer questions we were interested in. Our first article sought to quantify the clutchness of NBA players, so we called it DataBucket to honor the basketball-related heritage of the blog!
JC: Your team chose to work on the problem of defining entertainment value of an NBA game. You incorporated data from Instagram into your solution. Can you explain what data you pulled from Instagram and how you used them?
DB: The prompt suggested that we incorporate creativity into our project, so we decided to use alternative data. Barbara was familiar with the Instagram API, having used it before for the DataBucket blog, and scraped quantities of hashtags related to each player's name as a proxy of player popularity. Harold was familiar with Google Trends, which he used to scrape timely data on search terms that would be most relevant to a blockbuster game (i.e. NBA on TNT).
JC: One of the highlights of your presentation is the decision making tool you created for the manager. What tools did you use to build it?
DB: We used Tableau to visualize our dataset. Given the time constraints, Tableau was the easiest tool to create something interactive and visually appealing without much effort.
Running the regression and cleaning the data was in Python and R - we used whatever we were most comfortable with and went with it!
JC: If you had a chance to do one thing differently in the Hackathon, what would it be?
DB: We definitely wanted to explore Twitter data a bit more. While Instagram is a good indicator of player popularity, Twitter is more of a real-time platform that captures more accurate sentiment of a particular game, but we couldn't hack the API in time.
On another note, we would have loved to forecast game-level entertainment value for this upcoming season instead of validating our model for this past season.
JC: Thank you so much for speaking with me.
Comments
You can follow this conversation by subscribing to the comment feed for this post.