The simulation is for the maximum hitting streak for the history of baseball. So 50% of simulated histories have one season with a hitting streak longer than 53.

It is actually a bit silly, because the higher streaks are either Dimaggio or the couple of guys from an earlier era. So the question becomes: With a bit of luck could these guys have been better? Answer of course is yes.

Ken's got it right. This is claiming that if professional baseball were played over its ~1871 to 2007 history, 10000 times, that 53 would be the median record per "alternate history," not per season.

This also solves your last question. Its not .75% compared to 50%, because the 50% applies to all seasons per "alternate history", not to any individual season.

So the record is still impressive, but yes, in some sense it could have been more impressive in an alternate universe :)

But "unbeatable", when claimed in 2008, might be a prediction that 53 is unlikely to be exceeded in the space of all future universes 2008-infinity, not all alternative universes 1871-2007.

See Stephen Jay Gould's Full House for why outstanding excellence in baseball gets harder to achieve as time goes on and the average player improves.

I think the above comments are correct; however, there's also a methodological problem with the study, in that it assumes that the batter's batting average is the result of a string of AB against equal pitchers. In fact, this is not the case (some pitchers are better than others), and is also the assumption that maximizes the expected length of a hitting streak.

For simplicity, imagine two scenarios:
(A) A batter gets 4 ABs per game, each game having a different pitcher, with each pitcher he has a 1/3 chance of getting a hit.
(B) A batter gets 4 ABs per game, each game having a different pitcher, with 1/3 of pitchers he has a 100% of getting a hit, and 2/3 of pitchers a 0% chance.

In both cases, the batter will hit .333. However, lengthy hitting streaks are only possible in scenario A.

-- Eric

Here here, Derek: this is a classic case of rigor (the 10k iterations) masking the overall immaturity of the variable definitions in the simulation. What of the variance in pitcher performance due to pitcher skill level, coaching tactic (not pitching to the streaking hitter), or the overwhelming pressure of intensified public scrutiny (as well evidenced from the first-person narratives of Dimagio and would-be inheritors of the record)? Assuming a flatline of probability from ABs 1->56 is the inherent flaw in the Times' analysis - be it 10k or 10 million k simulations.

This is a fascinating post. I'd never thought that one would need so much variables to make this simulation "at least plausible".
I'm very much interested in statistics, even though my knowledge is very limited. Any idea on where I could find some resources to finally understand what you smart guys are talking about ? :-P

-- Tim

I highly recommend "Curve Ball" the book by Albert and Bennett for a statistician's perspective. It's featured in the Core Collection; click on the link above and then click on popular statistics.

Ordered on amazon. Thanks for the tip Kaiser :-)

-- Tim

Monte Carlo is always going to closely match historical data. Garbage out, garbage in. But its virtue is that it captures features that a model may overlook.

If you wanted to predict the likelihood of a streak in a given future season, what you would want to do is take all the data (probably weighting the recent data more heavily due to changes in the game over time), find a probablility distribution that is a good fit (probably something like the Poisson distribution), and estimate the parameters of the probability distribution from the data.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

## Book Blog

Graphics design by Amanda Lee