Nate Silver first attracted attention two election cycles ago with the launch of his fivethirtyeight.com website (538 is the number of electoral votes in the United States.) He makes clean charts, which I like a lot. Since that time, he has earned a platform on the New York Times website, which goes some way to explaining the vitriol hurled at him during the just-concluded election season by the right wing. Being in the national conversation is convenient when one has a book coming off the press – and I write this admiringly, as I believe Silver’s popularity and influence further the cause of anyone in favor of the data-driven mindset. Surely, the predictive success of his model, as well as those of a number of copycats, has resoundingly humbled the pundits and tealeaf readers, who talked themselves into ignoring the polling data.
The book is titled The Signal and the Noise (link). As explained by Silver, these terms originated in the electrical engineering realm, and have long served as a metaphor for the statistician’s vocation, that is, separating the signal from the noise. Imagine making a long-distance call from California to Tokyo. Your voice, the signal, is encoded and sent along miles of cables and wires from one handset to the other, picking up interference, the noise, along the way. The job of electrical engineers is to decipher the garbled audio at the other end, by sizing and removing the noise. When the technology fails, you have a “bad connection”, and you can literally hear the noise.
It is in the subtitle—“why so many predictions fail – but some don’t”—that one learns the core philosophy of Silver: he is most concerned with the honest evaluation of the performance of predictive models. The failure to look into one’s mirror is what I often describe as the elephant in the data analyst’s room. Science reporters and authors keep bombarding us with stories of success in data mining, when in fact most statistical models in the social sciences have high rates of error. As Silver’s many case studies demonstrate, these models are still useful but they are far from infallible; or, as Silver would prefer to put it, the models have a quantifiable chance of failing.
In 450 briskly-moving pages, Silver takes readers through case studies on polling, baseball, the weather, earthquakes, GDP, pandemic flu, chess, poker, stock market, global warming, and terrorism. I appreciate the refreshing modesty in discussing the limitation of various successful prediction systems. For example, one of the subheads in the chapter about a baseball player performance forecasting system he developed prior to entering the world of political polls reads: “PECOTA Versus Scouts: Scouts Win” (p. 88). Unlike many popular science authors, Silver does not portray his protagonists as uncomplicated heroes, he does not draw overly general conclusions, and he does not flip from one anecdote to another but instead provides details for readers to gain a fuller understanding of each case study. In other words, we can trust his conclusions, even if his book contains little Freakonomics-style counter-intuition.
Performance measurement is a complex undertaking. To illustrate this point, I list the evaluation methods deployed in the key case studies of the book:
- McLaughlin Group panel predictions (p. 49): proportion of predictions that become “completely true” or “mostly true,” ignoring predictions that cannot be or are not yet verifiable
- Election forecasts (p. 70): proportion of Republican wins among those districts predicted to be “leaning Republican” (underlying this type of evaluation is some criterion for calling a race to be “leaning”)
- Baseball prospect forecasting (p. 90): number of major-league wins generated by players on the Top 100 prospect list in specified window of time; the wins attributed to individual players are computed via a formula known as “wins above replacement player”
- Daily high-temperature forecasts (p. 132): average difference between predicted temperature (x days in advance) and actual temperature relative to “naïve” methods of prediction, such as always predicting the average temperature, or predicting tomorrow’s temperature to equal today’s
- Rainfall forecast (p. 135): how close to, say, 20%, is the proportion of days on which it actually rained, among those days when the weather service forecasts 20% chance of rain
- Earthquake forecast (p. 160): whether an earthquake in the predicted range of magnitude occurred in the predicted range of time at a predicted region of the world, or not (this is a binary outcome)
- GDP growth forecast (p. 182): the proportion of times in which the economist’s prediction intervals contain the actual GDP growth
- Chess (Ch. 9): winning games
- Poker (p. 311): amount of earnings
- Long-range global temperature forecast (pp. 398, 402): actual trend against predicted trend. (Note that this is the same method as #7 but with only one prediction interval.)
If you are thinking the evalution methods listed above seem numerous and arbitrary, you’d be right. After reading Silver’s book, you should be thinking critically about how predictions are evaluated (and in some cases, how they may be impossible to verify). Probabilistic forecasts that Silver advocates are even harder to validate. Silver tells it like it is: this is difficult but crucial work; and one must look out for forecasters who don’t report their errors, as well as those who hide their errors by using inappropriate measurement.
Throughout the book, Silver makes many practical recommendations that reveal his practitioner’s perspective on forecasting. As an applied statistician, I endorse without hesitation specific pieces of advice, such as use probability models, more data could make predictions worse, mix art and science, try hard to find the right data, don’t just use readily available data, and avoid too much precision.
The only exaggeration in the book is his elevation of “Bayesian” statistics as the solution to predictive inaccuracy. What he packages as Bayesian has been part of statistical science even before the recent rise of modern Bayesian statistics. (The disagreement between Bayesians and non-Bayesians is over how these concepts are utilized.) Silver’s exposition focuses on probability updating in sequential decision-making, which is understandable given his expertise in sequential settings with a rich tradition of data collection, such as baseball and polling. (At one point, he makes an astute comment about data analysts selecting more promising settings in which to work.) The modern Bayesian movement is much broader than probability updating, and I’d point you to Professor Andrew Gelman’s blog and/or books as a place to explore what I mean by that. It must be said, though, that the technicalities of Bayesian statistics are tough to convey in a mass-market book.
In spite of the minor semantic issue, I am confident my readers will enjoy reading Silver’s book (link). It is one of the more balanced, practical books on statistical thinking on the market today by a prominent public advocate of the data-driven mindset.