At the heart of both successful clinical diagnosis and medical research is the ability to distinguish the useful from the extraneous-the signal from the noise. Medicine has long labored with the difficulties this distinction demands and has continuously searched for better tools to optimize this task.
Currently the focus is on the use of powerful computers and Watson-like technology to improve success rates. Nathaniel "Nate" Silver in his book The Signal and the Noise brings a very different and refreshing perspective on the subject of predictions. Indeed, he is skeptical of "big data" and notes that large volumes of data don't necessarily make predictions easier. Large volumes of information sometimes just allow us to find preconceived patterns where none exist.
Silver comes from a background far afield from medicine. He is a statistician, professional online poker player, baseball aficionado, and political blogger, with a degree in economics. He achieved notoriety with his popular New York Times blog by accurately predicting the presidential elections of President Obama in 2008 along with all 35 Senate races, and in 2012 the correct winner in 50 out of 50 states (http://fivethirtyeight.blogs.nytimes.com). He subsequently left the Times to start his own site, http://FiveThirtyEight.com, covering politics, economics, science, "life," and sports.
The book takes us on a whirlwind tour of a broad spectrum of areas where so-called experts make predictions-weather, the stock market, baseball, and elections, as well as more medically concentrated subjects like breast cancer risk and swine flu epidemics. In every instance his focus is "the difference between what we know and what we think we know."
Fox vs. Hedgehog
In Silver's view, the accuracy of predictions is heavily dependent upon how willing the observer is to modify initial beliefs based upon newly emerging information.
He uses the analogy of the fox and the hedgehog: The fox is willing to consider many ideas and contrarian data to modify and improve his prediction. The hedgehog has one big idea-one preconceived notion, and selectively uses that idea to sort out conflicting information to fit the conclusion. Thus political pundits on the right or the left will predict their preferred candidate to be the winner regardless of data to the contrary.
In contrast, weather forecasters use multiple models, constantly update their data, and express their predictions in ranges of probability. Weather forecasters are thus much more likely to make more accurate predictions.
Predictions are more likely successful when there is ample data on which to make them. Weather, poker, and political campaigns fall into this category. Earthquakes, while their frequency is reasonably predictable over a very long time frame, are extremely difficult to predict in a specific location and time frame relevant to preventing disaster.
Swine Flu Scare of '76
What about other situations in which the data is sparse but the risks are great? That was the situation with the swine flu scare in 1976. The sudden death of a 19-year-old soldier from H1N1 influenza at Fort Dix caused the government and the media to circle back to the Spanish flu epidemic of 1918, when worldwide, 50 million people had perished from an apparently similar H1N1 flu strain.
In 1976, the Secretary of Health, Education, and Welfare, F. David Matthews, predicted that possibly a million Americans would die. President Ford authorized manufacture of 200 million doses of swine flu vaccine, and widespread vaccination began that next fall. In the interim, no other fatality from swine flu was seen in the United States.
In other countries where flu season was already under way, cases of H1N1 were rare. Criticism of the program began to emerge in medical journals like Lancet and from the [then-named] Communicable Disease Center. Against the advice of their medical experts who estimated the worst-case outcome at no higher than 35 percent and as low as two percent, the administration (the hedgehog in this story) developed public service announcements to persuade the public about the risk.
The aftermath of the fiasco is well known: No outbreak occurred, but over 500 vaccinated individuals developed Guillain-Barre syndrome. It is likely that part of the public's present skepticism about vaccinations results from this episode.
In 2009 another H1N1 pandemic occurred. Again somewhat dire predictions were made. Fifty-five million Americans were infected, but the outbreak was mild, with a fatality rate of just 0.02 percent-less than the usual in a typical flu season.
In Silver's view, the predictions failed because of the difficulties making extrapolations from a small number of data points. Reliable estimates of the probability of spread can be made only after a substantial number of infections have taken place. Unfortunately epidemiologists and others sometimes do not have the luxury to wait.
Role Models
In the chapter on role models, Silver discusses other epidemics of interest to physicians, including AIDS, MRSA, and SARS, exploring the strengths and weaknesses of the existing epidemiologic models.
One of the centerpieces of Silver's approach to prediction is based upon Bayesian reasoning, developed in the 18th century by mathematician and Reverend Thomas Bayes. He argued that any probability can be interpreted only in the context of a baseline that preceded the measurement. One establishes a set of "priors" based on knowledge gleaned from experience, and this knowledge base is continuously updated as new information is available.
The "prior" is used to interpret the data presented. An excellent example of the application of Bayesian reasoning is seen in Silver's analysis of the risk of breast cancer in 40-year-old women. The baseline risk (the "prior") is 1.4 percent. Now one adds new information-a percentage is heavily populated by women under age 40. The true incidence of breast cancer in a 40-year-old woman with a positive mammogram is 10 times the baseline risk but still only about 10 percent. For this reason, Silver sides with the recommendation that screening should not begin until age 50, where the base rate (the "prior') is significantly higher.
'Airy & Unconventional Style'
These medical examples and many others from other fields are presented in The Signal and the Noise in an airy and unconventional style that makes the statistics and probabilities comprehensible and readable. That said, the work is serious and scholarly and extensively researched, with 56 pages of footnotes.
The book is an excellent lesson in how to reason using sound probability theory, and Silver has flattering things to say about the way those in medical fields reason: "Much of the most thoughtful work I have found on the use and abuse of statistical models, and on the proper role of prediction, comes from people in the medical profession," he writes.
Before one becomes too overconfident, though, dig into The Signal and the Noise. You will be entertained and educated and finish humbled by the realization that prediction has no magic formula, and that even the most prudent predictions cannot be foolproof.
2015 (PAPERBACK), PENGUIN, ISBN 0143125087, ALSO AVAILABLE IN HARDCOVER, KINDLE, AND AUDIO EDITIONS
More OT Book Reviews!
Find the full collection of Bob Young's OT book reviews online: bit.ly/OTCollections-Books