A data visualization that is invariant to the data
Raw data and the incurious

Bewildering baseball math

Over Twitter, someone asked me about this chart:


It's called the MLB pipeline. The text at the top helpfully tells us what the chart is about: how the playoff teams in baseball are built. That's the good part.

It then took me half a day to understand what is going on below. There are four ways for a player to be on a team: homegrown, trades and free agents, wherein homegrown includes drafted players or international players.

Each row is a type of player. You can look up which teams have exactly X players of a specific type. It gets harder if you want to know how many players team Y has of a given type. It is even harder if you don't know the logos of every team (e.g. Toronto Blue Jays).

Some fishy business is going on with the threesomes and foursomes. Here is the red threesome:


Didn't know baseball employs half a player. The green section has a different way to play threesomes:


The blue section takes inspiration from both and shows us a foursome:


I was stuck literally in the middle for quite a while:


Eventually, I realized that this is a summary of the first two sections on the page. I still don't understand why there is no gap between 11 and 14 but then the 14 and 15 arrows are twice as large as 9, 10 and 11 even though every arrow contains exactly one team.


The biggest problem in the above chart is the hidden base: each team's roster has a total of 25 players.

Here is a different view of the data:


With this chart, I want to emphasize two points: first, addressing the most interesting question of which team(s) emphasize which particular player acquisition tactic; second, providing the proper reference level to interpret the data.

Regarding the vertical, reference lines: take the top left chart about players arriving through trade. If every team equally emphasizes this tactic, then each team should have the same number of traded players on the 25-person roster. This would mean every team has approximately 11 traded players. This is clearly not the case. Several teams, especially Cubs and Blue Jays, utilized trades more often than teams like Mets and Royals.




Feed You can follow this conversation by subscribing to the comment feed for this post.

Fabio Machado

Why not just used a stacked bar chart?


Fabio: Because what interests me isn't the mix of acquisition strategies within each team but the relative use of each strategy by different teams. In addition, I want to establish a reference level so that readers can interpret the data, not just read them.

The stacked bar chart constrains the designer in how the data are ordered. I use it infrequently.

Alex Lea

Kaiser: Great article. I understand your concerns with using a stacked bar, but I think this is one of the times when it really does make sense (especially compared to the abomination of the original graphic!).

Even more so considering there are only '3.5' categories (home-grown draft/international could be coloured slightly different shades of the same colour to show their connection. A 100% stacked bar would mean that only the central segment would be difficult to judge.

I'd also play devil's advocate and suggest that you don't need to know the exact figures, just the general pattern. For instance, that the two WS teams tend to use more home-growns and the least number of trades. In other words, successful teams are not built, they are grown!

The comments to this entry are closed.