Nate Silver first attracted attention two election cycles
ago with the launch of his fivethirtyeight.com website (538 is the number of electoral
votes in the United States.) He makes clean charts, which I like a lot. Since that time, he has earned a platform on
the New York Times website, which goes some way to explaining the vitriol hurled
at him during the just-concluded election season by the right wing. Being in
the national conversation is convenient when one has a book coming off the
press – and I write this admiringly, as I believe Silver’s popularity and
influence further the cause of anyone in favor of the data-driven mindset. Surely,
the predictive success of his model, as well as those of a number of copycats,
has resoundingly humbled the pundits and tealeaf readers, who talked themselves
into ignoring the polling data.
The book is titled The Signal and the Noise (link). As explained by
Silver, these terms originated in the electrical engineering realm, and have
long served as a metaphor for the statistician’s vocation, that is, separating
the signal from the noise. Imagine making a long-distance call from California
to Tokyo. Your voice, the signal, is encoded and sent along miles of cables and
wires from one handset to the other, picking up interference, the noise, along
the way. The job of electrical engineers is to decipher the garbled audio at
the other end, by sizing and removing the noise. When the technology fails, you
have a “bad connection”, and you can literally hear the noise.
***
It is in the subtitle—“why so many predictions fail – but
some don’t”—that one learns the core philosophy of Silver: he is most concerned
with the honest evaluation of the performance of predictive models. The failure
to look into one’s mirror is what I often describe as the elephant in the data
analyst’s room. Science reporters and authors keep bombarding us with stories
of success in data mining, when in fact most statistical models in the social
sciences have high rates of error. As Silver’s many case studies demonstrate, these
models are still useful but they are far from infallible; or, as Silver would
prefer to put it, the models have a quantifiable chance of failing.
In 450 briskly-moving pages, Silver takes readers through
case studies on polling, baseball, the weather, earthquakes, GDP, pandemic flu,
chess, poker, stock market, global warming, and terrorism. I appreciate the
refreshing modesty in discussing the limitation of various successful prediction
systems. For example, one of the subheads in the chapter about a baseball player
performance forecasting system he developed prior to entering the world of
political polls reads: “PECOTA Versus Scouts: Scouts Win” (p. 88). Unlike many
popular science authors, Silver does not portray his protagonists as uncomplicated
heroes, he does not draw overly general conclusions, and he does not flip from
one anecdote to another but instead provides details for readers to gain a fuller
understanding of each case study. In other words, we can trust his conclusions,
even if his book contains little Freakonomics-style counter-intuition.
***
Performance measurement is a complex undertaking. To
illustrate this point, I list the evaluation methods deployed in the key case
studies of the book:
-
McLaughlin Group panel predictions (p. 49):
proportion of predictions that become “completely true” or “mostly true,”
ignoring predictions that cannot be or are not yet verifiable
- Election forecasts (p. 70): proportion of
Republican wins among those districts predicted to be “leaning Republican”
(underlying this type of evaluation is some criterion for calling a race to be
“leaning”)
- Baseball prospect forecasting (p. 90): number of
major-league wins generated by players on the Top 100 prospect list in
specified window of time; the wins attributed to individual players are
computed via a formula known as “wins above replacement player”
- Daily high-temperature forecasts (p. 132): average
difference between predicted temperature (x days in advance) and actual
temperature relative to “naïve”
methods of prediction, such as always predicting the average temperature, or
predicting tomorrow’s temperature to equal today’s
- Rainfall forecast (p. 135): how close to, say,
20%, is the proportion of days on which it actually rained, among those days
when the weather service forecasts 20% chance of rain
- Earthquake forecast (p. 160): whether an
earthquake in the predicted range of magnitude occurred in the predicted range
of time at a predicted region of the world, or not (this is a binary outcome)
- GDP growth forecast (p. 182): the proportion of
times in which the economist’s prediction intervals contain the actual GDP
growth
- Chess (Ch. 9): winning games
- Poker (p. 311): amount of earnings
- Long-range
global temperature forecast (pp. 398, 402): actual trend against predicted
trend. (Note that this is the same method as #7 but with only one prediction
interval.)
If
you are thinking the evalution methods listed above seem numerous and arbitrary,
you’d be right. After reading Silver’s book, you should be thinking critically
about how predictions are evaluated (and in some cases, how they may be
impossible to verify). Probabilistic forecasts that Silver advocates are even
harder to validate. Silver tells it like it is: this is difficult but crucial
work; and one must look out for forecasters who don’t report their errors, as
well as those who hide their errors by using inappropriate measurement.
***
Throughout the book, Silver makes many practical recommendations
that reveal his practitioner’s perspective on forecasting. As an applied
statistician, I endorse without hesitation specific pieces of advice, such as
use probability models, more data could make predictions worse, mix art and
science, try hard to find the right data, don’t just use readily available data,
and avoid too much precision.
The only exaggeration in the book is his elevation of
“Bayesian” statistics as the solution to predictive inaccuracy. What he
packages as Bayesian has been part of statistical science even before the
recent rise of modern Bayesian statistics. (The disagreement between Bayesians
and non-Bayesians is over how these concepts are utilized.) Silver’s exposition
focuses on probability updating in sequential decision-making, which is
understandable given his expertise in sequential settings with a rich tradition
of data collection, such as baseball and polling. (At one point, he makes an
astute comment about data analysts selecting more promising settings in which
to work.) The modern Bayesian movement is much broader than probability
updating, and I’d point you to Professor Andrew Gelman’s blog and/or books as a place to
explore what I mean by that. It must be said, though, that the technicalities
of Bayesian statistics are tough to convey in a mass-market book.
***
In spite of the minor semantic issue, I am confident my
readers will enjoy reading Silver’s book (link). It is one of the more balanced,
practical books on statistical thinking on the market today by a prominent public
advocate of the data-driven mindset.
Recent Comments