Stranger things found on scatter plots
Nov 29, 2023
Washington Post published a nice scatter plot which deconstructs scores from the recent World Championships in Gymnastics. (link)
The chart presents the main message clearly - the winner Simone Biles scored the highest on both components of the score (difficulty and execution), by quite some margin.
What else can we learn from this chart?
***
Every athlete who qualified for the final scored at or above average on both components.
Scoring below average on either component is a death knell: no athlete scored enough on the other component to compensate. (The top left and bottom right quadrants would have had some yellow dots otherwise.)
Several athletes in the top right quadrant presumably scored enough to qualify but didn't. The footnote likely explains it: each country can send at most two athletes to the final. It may be useful to mark out these "unlucky" athletes using a third color.
Curiously, it's not easy to figure out who these unlucky athletes were from this chart alone. We need two pieces of data: the minimum qualifying score, and the total score for each athlete. The scatter plot isn't the best chart form to show totals, but qualification to the final is based on the sum of the difficulty and execution scores. (Note also, neither axis starts at zero, compounding the challenge.)
***
This scatter plot is most memorable for shattering one of my expectations about risk and reward in sports.
I expect risk-seeking athletes to suffer from higher variance in performance. The tennis player who goes for big serves tend to also commit more double faults. The sluggers who hit home runs tend to strike out more often. Similarly, I expect gymnasts who attempt more difficult skills to receive lower execution scores.
Indeed, the headline writer seemed to agree, suggesting that Biles is special because she's both high in difficulty and strong in execution.
The scatter plot, however, sends the opposite message - this should not surprise. The entire field shows a curiously strong positive correlation between difficulty and execution scores. The more difficult is the routine, the higher the excution score!
It's hard to explain such a pattern. My guesses are:
a) judges reward difficult routines, and subconsciously confound execution and difficulty scores. They use separate judges for excecution and difficulty. Paradoxically, this arrangement may have caused separation anxiety - the judges for execution might just feel the urge to reward high difficulty.
b) those athletes who are skilled enough to attempt more difficult routines are also those who are more consistent in execution. This is a type of self-selection bias frequently found in observational data.
Regardless of the reasons for the strong correlation, the chart shows that these two components of the total score are not independent, i.e. the metrics have significant overlap in what they measure. Thus, one cannot really talk about a difficult routine without also noting that it's a well-executed routine, and vice versa. In an ideal scoring design, we'd like to have independent components.