Philip E. Tetlock and Dan Gardner
Superforecasting:
The Art and Science of Prediction
Crown Books, September 2015
Quotes and comments on the book
Chapter 1: An optimistic skeptic
p. 3: regarding expert
opinions, there is usually no accurate measurement of how good they are, there
are “just endless opinions - and
opinions on opinions. And that is business as usual.”; the media routinely
delivers, or corporations routinely pay for, opinions that may be accurate,
worthless or in between and everyone makes decisions on that basis
p. 5: talking head talent is
skill in telling a compelling story, which is sufficient for success; their
track record isn’t irrelevant - most of them are about as good as random
guessing; predictions are time-sensitive - 1-year predictions tend to beat
guessing more than 5- or 10-year projections
p. 8-10: there are limits on
what is predictable → in nonlinear systems, e.g., weather patterns, a small
initial condition change can lead to huge effects (chaos theory); we cannot see
very far into the future
p. 13-14: predictability and
unpredictability coexist; a false dichotomy is saying the weather is unpredictable
- it is usually relatively predictable 1-3 days out, but at days 4-7 accuracy
usually declines to near-random; weather forecasters are slowly getting better
because they are in an endless forecast-measure-revise loop; prediction consumers, e.g., governments,
businesses and regular people, don’t demand evidence of accuracy, so it isn’t
available, and that means no revision, which means no improvement
p. 15: Bill Gate’s
observation: surprisingly often a clear goal isn’t specified so it is impossible
to drive progress toward the goal; that is true in forecasting; some forecasts
are meant to (1) entertain, (2) advance a political agenda, or (3) reassure the
audience their beliefs are correct and the future will unfold as expected (this
kind is popular with political partisans)
p. 16: the lack of rigor in
forecasting is a huge opportunity; to seize it (i) set the goal of accuracy and
(ii) measure success and failure
p. 18: the Good Judgment
Project found two things, (1) foresight is real and some people have it and (2)
it isn’t strictly a talent from birth - (i) it boils down to how people think, gather
information and update beliefs and (ii) it can be learned and improved
p. 21: a 1954 book showed
that analysis of 20 studies showed that algorithms based on objective
indicators were better predictors than well-informed experts; more than 200
later studies have confirmed that and the conclusion is simple - if you have a well-validate statistical
algorithm, use it
p. 22: machines may never be
able to beat talented humans, so dismissing human judgment as just subjective
goes too far; maybe the best that can be done will come from human-machine
teams, e.g., Garry Kasparov and Deep Blue together against a machine or a human
p. 23: quoting David Ferrucci,
IBM’s Watson’s chief engineer is optimistic: ““I think it’s going to get stranger and stranger” for people to listen
to the advice of experts whose views are informed only by their subjective
judgment.”; Tetlock: “. . . . we
will need to blend computer-based forecasting and subjective judgment in the
future. So it’s time to get serious about both.”
Chapter 2: Illusions of knowledge
p. 25: regarding a medical
diagnosis error: “We all been too quick
to make up our minds and too slow to change them. And if we don’t examine how
we make these mistakes, we will keep making them. This stagnation can go on for
years. Or a lifetime. It can even last centuries, as the long and wretched
history of medicine illustrates.”
p. 30: “It was the absence of doubt - and scientific rigor - that made medicine
unscientific and caused it to stagnate for so long.”; it was an illusion
of knowledge - if the patient died, he was too sick to be saved, but if he
got better, the treatment worked - there was no controlled data to support
those beliefs; physicians resisted the idea of randomized, controlled trials as
proposed in 1921 for decades because they knew their subjective judgments
revealed the truth
p. 35: on Khaneman’s fast
system 1: “A defining feature of intuitive
judgment is its insensitivity to the quality of the evidence on which the
judgment is based. It has to be that way. System 1 can only do its job of
delivering strong conclusions at lightning speed if it never pauses to wonder
whether the evidence at hand is flawed or inadequate, or if there is better
evidence elsewhere.” - context - instantly running away from a Paleolithic
shadow that might be a lion; Khaneman calls these tacit assumptions WYSIATI; system
1 judgments take less than 1 sec. - there’s no time to think about things; regarding
coherence: “. . . . we are creative
confabulators hardwired to invent stories that impose coherence on the world.”
p. 38-39: confirmation bias: (i) seeking evidence
to support the 1st plausible explanation, (ii) rarely seeking
contradictory evidence and (iii) being a motivated skeptic in the face of
contrary evidence and finding even weak or no reasons to denigrate contradictory evidence or reject it
entirely, e.g., a doctor’s belief that a quack medical treatment works for all but
the incurable is taken as proof that it works for everyone except the incurably ill; that arrogant, self-deluded mind set kept medicine in the dark ages for millennia and people suffered accordingly
p. 40: attribute substitution, availability heuristic or bait and switch:
one question may be difficult or unanswerable w/o more info, so the unconscious
System 1 substitutes another, easier, question and the easy question’s answer
is the same as the hard question’s answer, even when it is wrong; CLIMATE CHANGE EXAMPLE: people who
cannot figure out climate change on their own substitute what most climate
scientists believe for their own belief - it can be wrong
p. 41-42: “The instant we wake up and look past the
tip of our nose, sights and sounds flow into the brain and System 1 is engaged.
This system is subjective, unique to each of us.”; cognition is a matter of
blending inputs from System 1 and 2 - in some people, System 1 has more dominance
than in others; it is a false dichotomy to see it as System 1 or System 2
operating alone; pattern recognition:
System 1 alone can make very good or bad snap judgments and the person may not
know why - bad snap judgment or false positive = seeing the Virgin Mary in
burnt toast (therefore, slowing down to double check intuitions is a good idea)
p. 44: tip of the nose
perspective is why doctors did not doubt their own beliefs for thousands of
years
Chapter 3: Keeping Score
p. 48: it is not unusual that
a forecast that may seem dead right or wrong really cannot be “conclusively judged right or wrong”;
details of a forecast may be absent and the forecast can’t be scored, e.g., no time
frames, geographic locations, reference points, definition of success or
failure, definition of terms, a specified probability of events (e.g., 68%
chance of X) or lack thereof or many comparison forecasts to assess the
predictability of what is being forecasted; p. 53: “. . . . vague verbiage is more the rule than the exception.”; p.
55: security experts asked what the term “serious possibility” meant in a 1951
National Intelligence Estimate → one said it meant 80 to 20 (4 times more
likely than not), another said it meant 20 to 80 and others said it was in
between those two extremes
p. 50-52: national security
experts had views split along liberal and conservative lines about the Soviet
Union and future relations; they were all wrong and Gorbachev came to power and
de-escalated nuclear and war tensions; after the fact, all the experts claimed
they could see it coming all along; “But
the train of history hit a curve, and as Karl Marx once quipped, the
intellectuals fall off.”; the experts were smart and well-informed, but
they were just misled by System 1’s subjectivity (tip of the nose perspective)
p.58-59: the U.S.
intelligence community resisted putting definitions and specified probabilities
in their forecasts until finally, 10 years after the WMD fiasco with Saddam
Hussein, the case for precision was so overwhelming that they changed; “But
hopelessly vague language is still so common, particularly in the media, that
we rarely notice how vacuous it is. It just slips by.”
p. 60-62: calibration: perfect calibration = X%
chance of an event when past forecasts have always been “there is a X% chance”
of the event, e.g., rainfall; calibration requires many forecasts for the
assessment and is thus impractical for rare events, e.g., presidential
elections; underconfidence = prediction is X% chance, but reality is a larger
X+Y% chance; overconfidence = prediction is X% chance, but reality is a smaller
X-Y% chance
p. 62-66: the two facets of
good judgment are captured by calibration and resolution; resolution: high resolution occurs when predictions of low < ~
20% or high > ~80% probability events are accurately predicted; accurately
predicting rare events gets more weight than accurately predicting more common
events; a low Brier score is best, 0.0 is perfect, 0.5 is random guessing and
2.0 is getting all or none, or yes or no, predictions wrong 100% of the time;
however a score of 0.2 in one circumstance, e.g., weather prediction in Phoenix,
AZ looks bad, while a score of 0.2 in Springfield MO is great because the
weather there is far less predictable than in Phoenix; apples-to-apples
comparisons are necessary, but it is very hard to find that kind of data -
it usually doesn’t exist
p. 68: In EPJ, the bottom
line was that some experts were marginally better than random guessing - the
common characteristic was how they thought, not their ideology, Ph.D. or not,
or access to classified information; the typical expert was about as good as
random guessing and their thinking was ideological; “They sought to squeeze complex problems into the preferred cause-effect
templates and treated what did not fit as irrelevant distractions. Allergic to
wishy-washy answers, they kept pushing their analyses to the limit (and then
some), using terms like “furthermore” and “moreover” when piling up reasons why
they were right and others were wrong. As a result, they were confident to
declare things “impossible” or “certain.” Committed to their conclusions, they were
reluctant to change their minds even when their predictions clearly failed.
They would tell us, “Just wait.””
p. 69: “The other group consisted of more pragmatic experts who drew on many
analytical tools, with the choice of tool hinging on the particular problem
they faced. . . . . They talked about possibilities and probabilities, not
certainties.”
p. 69: “The fox knows many things but the hedgehog knows one big thing. . . . .
Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight.
Hedgehogs didn’t. . . . . How did hedgehogs manage to do slightly worse than
random guessing?”; hedgehog example is CNBC’s Larry Kudlow and his supply
side economics Big Idea in the face of the 2007 recession
p. 70-72: on Kudlow: “Think of that Big Idea as a pair of glasses
that the hedgehog never takes off. . . . And, they aren’t ordinary glasses.
They are green-tinted glasses . . . . Everywhere you look, you see green,
whether it’s there or not. . . . . So the hedgehog’s one Big Idea doesn’t improve
his foresight. It distorts it.”; more information helps increase hedgehog
confidence, not accuracy; “Not that
being wrong hurt Kudlow’s career. In January 2009, with the American economy in
a crisis worse than any since the Great Depression, Kudlow’s new show, The Kudlow Report, premiered on CNBC.
That too is consistent with the EPJ data, which revealed an inverse correlation
between fame and accuracy: the more famous an expert was, the less accurate he
was.”; “As anyone who has done media
training knows, the first rule is keep it simple, stupid. . . . . People tend
to find uncertainty disturbing and “maybe” underscores uncertainty with a
bright red crayon. . . . . The simplicity and confidence of the hedgehog
impairs foresight, but it calms nerves - which is good for the careers of
hedgehogs. . . . Foxes don’t fare so well in the media. . . . This aggregation
of many perspectives is bad TV.”
p. 73: an individual who does
a one-off accurate guess is different from people who do it consistently; consistency
is based on aggregation, which is the recognition that useful info is widely
dispersed and each bit needs a separate weighting for importance and relevance
p 74: on information
aggregation: “Aggregating the judgments
of people who know nothing produces a lot of nothing.”; the bigger the
collective pool of accurate information, the better the prediction or
assessment; Foxes aggregate, but Hedgehogs don’t
p. 76-77: aggregation: looking at a problem from
one perspective, e.g., pure logic can lead to an incorrect answer; multiple
perspectives are needed; using both logic and psycho-logic (psychology or human
cognition) helps; some people are lazy and don’t think, some apply logic to
some degree and then stop, while others pursue logic to its final conclusion →
aggregate all of those inputs to arrive at the best answer; “Foxes aggregate perspectives.”
P 77-78: on human cognition -
we don’t aggregate perspectives naturally: “The tip-of-your nose perspective insists that it sees reality
objectively and correctly, so there is no need to consult other perspectives.”
p. 79-80: on perspective
aggregation: “Stepping outside ourselves
and really getting a different view of reality is a struggle. But Foxes are
likelier to give it a try.”; people’s temperament fall along a spectrum
from the rare pure Foxes to the rare pure Hedgehogs; “And our thinking habits are not immutable. Sometimes they evolve
without out awareness of the change. But we can also, with effort, choose to
shift gears from one mode to another.”
Chapter 4: Superforecasters
p. 84-85: the U.S.
intelligence community (IC) is, like every huge bureaucracy (about 100,000
people, about $50 billion budget), very change-resistant - they saw and
acknowledged their colossal failure to predict the Iranian revolution, but did little
or nothing to address their dismal capacity to predict situations and future
events; the WMD-Saddam Hussein disaster 22 years later finally inflicted a big
enough shock to get the IC to seriously introspect
p 88 (book review comment?):
my IARPA work isn’t as exotic as DARPA, but it can be just as important: that
understates the case → it is more important
p. 89: humans “will never be able to forecast turning
points in the lives of individuals or nations several years into the future -
and heroic searches for superforecasters won’t change that.”; the approach:
“Quit pretending you know things you
don’t and start running experiments.”
p. 90-93: the shocker:
although the detailed result is classified, Good Judgment Project (GJP))
volunteers who passed screening and used simple algorithms but without access
to classified information beat government intelligence analysts with access to
classified information; one contestant (a retired computer programmer) had a
Breier score of 0.22, 5th highest among 2,800 GJP participants and
then in a later competition among the best forecasters, his score increased to
0.14, top among the initial group of 2,800 → he beat the commodities futures
markets by 40% and the “wisdom of the crowd” control group by 60%
p. 94-95: the best
forecasters got things right at 300 days out more than regular forecasters
looking out 100 days and that improved over the 4-year GJP experiment: “. . . . these superforecasters are amateurs
forecasting global events in their spare time with whatever information they
can dig up. Yet they somehow managed to set the performance bar high enough
that even the professionals have struggled to get over it, let alone clear it
with enough room to justify their offices, salaries and pensions.”
p. 96 (book review comment?):
“And yet, IARPA did just that: it put
the intelligence community’s mission ahead of the people inside the
intelligence community - at least ahead of those insiders who didn’t want to
rock the bureaucratic boat.”
p. 97-98: “But it’s easy to misinterpret randomness.
We don’t have an intuitive feel for it. Randomness is invisible from the
tip-of-your-nose perspective. We can see it only if we step outside of
ourselves.”; people can be easily tricked into believing that they can
predict entirely random outcomes, e.g., guessing coin tosses; “. . . . delusions of this sort are routine.
Watch business news on television, where talking heads are often introduced
with a reference to one of their forecasting references . . . . And yet many
people takes these hollow claims seriously.”
p. 99: “Most things in life involve skill and luck, in varying proportions.”
p. 99-101: regression to the mean cannot be
overlooked and is a necessary tool for testing the role of luck in performance
→ regression is slow for activities dominated by skill, e.g., forecasting, and
fast for activities dominated by chance/randomness, e.g., coin tossing
p. 102-103: the key question
is how did superforecasters hold up across the years? → in years 2 and 3,
superforecasters were the opposite of regressing and they got better; sometimes
causal connections are nonlinear and thus not predictable and some of that had
to be present among the variables that affected what the forecasters were
facing → there should be some regression unless an offsetting process is
increasing forecasters’ performance; there is some regression - about 30% of superforecasters
fall out of the top 2% each year but 70% stay in - individual year-to-year
correlation is about 0.65, which is pretty high, i.e., about 1 in 3 → Q: Why
are these people so good?
Chapter 5: Supersmart?
p. 114: Fermi-izing
questions, breaking a question into relevant parts, allows better guesses,
e.g., how many piano tuners are there in Chicago → guess total pop, total #
pianos, time to do one piano, hours/year a tuner works → that technique usually
helps increase accuracy a lot, even when none of the numbers are known; Fermi-izing
tends to defuse the unconscious System 1’s tendency to bait & switch the
question; EXAMPLE: would testing of Arafat’s body 6 years after his death
reveal the presence of Polonium, which is allegedly what killed him? → Q1 - can
you even detect Po 6 years later? Q2: if Po is still detectable, how could it
have happened, e.g., Israel, Palestinian enemies before or after his death →
for this question the outside view, what % of exhumed bodies are found to be
poisoned is hard to (i) identify and (ii) find the answer to, but identifying
it is most important, i.e., it’s not certain (< 100%, say 80%), but it has
to be more that trivial evidence otherwise authorities would not allow his body
to be exhumed (> 20%) → use the 20-80% halfway point of 50% as the outside
view, then adjust probability up or down based on research and the inside or
intuitive System 1 view
p.118: superforecasters look
at questions 1st from Khaneman’s “outside view”, i.e., the statistical or historical base rate or
norm (the anchor) and then 2nd use the inside view to adjust
probabilities up or down → System 1 generally goes straight to the comfortable
but often wrong inside view and ignores the outside view; will there be a
Vietnam-China border clash in the nest year starts the 1st (outside)
view that asks how many clashes there have been over time, e.g., once every 5
years, and then merged in the 2nd view of current Vietnam-China
politics to adjust the baseline probability up or down
p. 120: the outside view has
to come first; “And it’s astonishingly
easy to settle on a bad anchor.”; good anchors are easier to find from the
outside view than from the inside
p. 123-124: some
superforecasters kept explaining in the GJP online forum how they approached
problems, what their thinking was and asking for criticisms, i.e., they were
looking for other perspectives; simply asking if a judgment is wrong tends to
lead to improvement in the first judgment; “The sophisticated forecaster knows about confirmation bias and will
seek out evidence that cuts both ways.”
p. 126: “A brilliant puzzle solver may have the raw material for forecasting,
but if he also doesn’t have an appetite for questioning basic,
emotionally-charged beliefs he will often be at a disadvantage relative to a
less intelligent person who has a greater capacity for self-critical thinking.”
p. 127: “For superforecasters, beliefs are hypothesis to be tested, not
treasures to be guarded.”
Chapter 6: Superquants?
p. 128-129: most
superforecasters are good at math, but mostly they rely on subjective judgment:
one super said this: “It’s all, you
know, balancing, finding relevant information and deciding relevant is this
really?”; it’s not math skill that counts most - its nuanced subjective
judgment
p. 138-140: we crave certainty
and that’s why Hedgehogs and their confident yes or no answers on TV are far
more popular and comforting than Foxes with their discomforting “on the one
hand . . . but on the other” style; people equate confidence with competence; “This sort of thinking goes a long way to
explaining why so many people have a poor grasp of probability. . . . The
deeply counterintuitive nature of statistics explains why even very
sophisticated people often make elementary mistakes.” A forecast of a 70%
chance of X happening means that there is a 30% chance it won’t - that fact is
lost on most people → most people translate an 80% of X to mean X will happen
and that just ain’t so; only when probabilities are closer to even, maybe about
65:35 to 34:65 (p. 144), does the translation for most people become “maybe” X
will happen, which is the intuitively uncomfortable translation of uncertainty
associated with most everything
p. 143: superforecasters tend
to be probabilistic thinkers, e.g., Treasury secy Robert Rubin; epistemic uncertainty describes
something unknown but theoretically knowable, while aleatory uncertainty is both unknown and unknowable
p. 145-146: superforecasters
who use more granularity, a 20, 21 or 22% chance of X tended to be more
accurate than those who used 5% increments and they tended to be more accurate
than those who used 10% increments, e.g., 20%, 30% or 40%; when estimates were
rounded to the nearest 5% or 10%, the granular best superforecasters fell into
line with all the rest, i.e., there was real precision in those more granular
1% increment predictions
p. 148-149: “Science doesn’t tackle “why” questions
about the purpose of life. It sticks to “how” questions that focus on causation
and probabilities.”; “Thus,
probabilistic thinking and divine-order thinking are in tension. Like oil and
water, chance and fate do not mix. And to the extent we allow our thoughts to
move in the direction of fate, we undermine our ability to think
probabilistically. Most people tend to prefer fate.”
p. 150: the sheer
improbability of something that does happen, you meet and marry your spouse, is
often attributed to fate or God’s will, not the understanding that sooner or
later many/most people get married to someone at some point in their loves; the
following psycho-logic is “incoherent”, i.e., not logic: (1) the chance of
meeting the love of my life was tiny, (2) it happened anyway, (3) therefore it
was meant to be and (4) therefore, the probability it would happen was 100%
p. 152: scoring for tendency
to accept or reject fate and accept probabilities instead, average Americans
are mixed or about 50:50, undergrads somewhat more biased toward probabilities
and superforecasters are the most grounded in probabilities, while rejecting
fate as an explanation; the more inclined a forecaster is to believe things are
destined or fate, the less accurate their forecasts were, while
probability-oriented forecasters tended to have the highest accuracy → the
correlation was significant
Chapter 7: Supernewsjunkies?
p. 154-155: based on news
flowing in, superforecasters tended to update their predictions and that tended
to improve accuracy; it isn’t just a matter of following the news and changing
output from sufficient new input - their initial forecasts were 50% more accurate
that regular forecasters
p. 160: belief perseverance = people
“rationalizing like crazy to avoid acknowledging new information that upsets
their settled beliefs.” → extreme obstinacy, e.g., the fact that something
someone predicted didn’t happen is taken as evidence that it will happen
p. 161-163: on underreacting
to new information: “Social
psychologists have long known that getting people to publicly commit to a
belief is a great way to freeze it in place, making it resistant to change. The
stronger the commitment, the greater the resistance.”; perceptions are a
matter of our “identity”; “. . . .
people’s views on gun control often
correlate with their views on climate change, even though the two issues have
no logical connection to each other. Psycho-logic trumps logic.”; “. . . . superforecasters may have a
surprising advantage: they’re not experts or professionals, so they have little
ego invested in each forecast.”; consider “career CIA analysts or acclaimed pundits with their reputations on the
line.”
p. 164: on overreacting to
new information: dilution effect = irrelevant
or noise information can and often does change perceptions of probability and
that leads to mistakes; frequent forecast updates based on small “units of
doubt” (small increments) and that seems to tend to minimize overreacting and
underreacting; balancing new information with the info that drive the original
or earlier updates captures the value of all the information
p. 170: Baye’s theorem: new/updated belief/forecast = prior belief x
diagnostic value of the new information; most superforecasters intuitively
understand Baye’s theorem, but can’t write the equation down nor do they
actually use it, instead they use the concept and weigh updates based on the
value of new information
Chapter 8: Perpetual Beta
p. 174-175: two basic
mindsets - the growth mindset is
that you can learn and grow through hard work; the fixed mindset holds that you have what you were born with and that
innate talents can be revealed but not created or developed, e.g., fixed
mindsetters say things like, e.g., “I’m bad at math”, and it becomes a
self-fulfilling prophecy; fixed mindset children given harder puzzles give up
and lose interest, while growth mindset kids loved the challenge because for
them, learning was a priority
p. 178: consistently inconsistent - John Maynard Keynes: engaged in an
endless cycle of try, fail, analyze, adjust, try again; he retired wealthy from
his investing, despite massive losses from the great depression and other
personal blunders; skills improve with practice
p. 181-183: lack of prompt
feedback is necessary for improvement, but it is usually lacking - experience
alone doesn’t compensate - experienced police gain confidence that they are
good at spotting liars, but it isn’t true because they don’t improve with time;
most forecasters get little or no feedback because (1) their language is
ambiguous and their forecasts are thus not precise enough to evaluate -
self-delusion is a real concern and (2) a long time lag between forecast and time
to get feedback on success or failure - with time a person forgets the details
of their own forecasts and hindsight
bias distorts memory, which makes it worse; vague language is elastic and
people read into it what they want; hindsight
bias = knowing the outcome of an event and that distorts our perception of
what we thought we knew before the outcome; experts succumb to it all the time,
e.g., prediction of loss of communist power monopoly in the Soviet Union before
it disintegrated in 1991 and after it happened → recall was 31% higher than
their original estimate
p. 190: “Superforecasters are perpetual beta.” - they have the growth mind
set
p. 191: list of
superforecaster traits
Chapter 9: Superteams
p. 201: success can lead to
mental habits that undermine the mental habits that led to success in the first
place; on the other hand, properly functioning teams can generate dragonfly eye
perspectives, which can improve forecasting
p. 208-209: givers on teams
are not chumps - they tend to make the whole team perform better; it is complex
and it will take time to work out the psychology of groups - replicating this
won’t be easy in then real world; “diversity trumps ability” may be true due to
the different perspectives a team can
generate or, maybe it’s a false dichotomy and a shrewd mix of ability and
diversity is the key to optimum performance
Chapter 10: The Leader’s Dilemma
p. 229-230: Tetlock uses the
Wehrmacht as an example of how leadership and judgment can be effectively
combined, even though it served an evil end → the points being that (i) even
evil can operate intelligently and creatively so therefore don’t underestimate
your opponent and (ii) seeing something as evil and wanting to learn from it
presents no logical contradiction but only a psycho-logical tension that
superforecasters overcome because they will learn from anyone or anything that
has information or lessons of value
Chapter 11: Are They really So Super?
p. 232-233: in a 2014
interview Gen. Michael Flynn, Head of DIA (DoD’s equivalent of the CIA; 17,000
employees) said “I think we’re in a period of prolonged societal conflict that
is pretty unprecedented.” but googling the phrase “global conflict trends” says
otherwise; Flynn, like Peggy Noonan and her partisan reading of political
events, suffered from the mother of all cognitive illusions, WYSIATI → every
day for three hours, Flynn saw nothing but reports of conflicts and bad news;
what is important is the fact that Flynn, a highly accomplished and intelligent
operative fell for the most obvious illusion there is → even when we know
something is a System 1 cognitive illusion, we sometimes cannot shut it off and
see unbiased reality, e.g., Müller-Lyer optical illusion (two equal lines, one
with arrow ends pointing out and one with ends pointing in - the in-pointing
arrow line always looks longer, even when you know it isn’t)
p. 234-237: “. . . . dedicated people can inoculate
themselves to some degree against certain cognitive illusions.”; scope insensitivity is a major illusion
of particular importance to forecasters - it is another bait & switch bias
or illusion where a hard question is unconsciously substituted with a simpler
question, e.g., the average amount groups of people would be willing to pay to
avoid 2,000, 20,000 or 200,000 birds drowning in oil ponds was the same for
each group, $80 → the problem’s scope recedes into the background so much
that it becomes irrelevant; the scope insensitivity bias or illusion
(Tetlock seems to use the terms interchangeably) is directly relevant to
geopolitical problems; surprisingly,
superforecasters were less influenced by scope insensitivity than average
forecasters - scope sensitivity wasn’t perfect, but it was good (better than
Khaneman guessed it would be); Tetlock’s guess → superforecasters were
skilled and persistent in making System 2 corrections of System 1 judgments,
e.g., by stepping into the outside view, which dampens System 1 bias and/or
ingrains the technique to the point that it is “second nature” for System 1
p. 237-238: CRITICISM: how long can
superforecasters defy psychological gravity?; maybe a long time - one developed
software designed to correct System 1 bias in favor of the like-minded and that
helped lighten the heavy cognitive load of forecasting; Nassim Taleb’s Black Swan criticism of all of this is
that (i) rare events, and only rare events, change the course of history and
(ii) there just aren’t enough occurrences to judge calibration because so few
events are both rare and impactful on history; maybe superforecasters can
spot a Black Sawn and maybe they can’t - the GJP wasn’t designed to ask that
question
p. 240-241, 244: REBUTTAL OF CRITICISM: the flow
of history flows from both Black Swan events and from incremental changes; if
only Black Swans counted, the GJP would be useful only for short-term
projections and with limited impact on the flow of events over long time
frames; and, if time frames are drawn out to encompass a Black Swan, e.g., the
one-day storming of the Bastille on July 14, 1789 vs. that day plus the ensuing
10 years of the French revolution, then such events are not so unpredictable -
what’s the definition of a Black Swan?; other than the obvious, e.g., there
will be conflicts, predictions 10 years out are impossible because the system
is nonlinear
p. 245: “Knowing what we don’t know is better than thinking we know what we
don’t.”; “Khaneman and other
pioneers of modern psychology have revealed that our minds crave certainty and
when they don’t find it, they impose it.”; referring to experts revisionist
response the unpredicted rise of Gorbachev: “In forecasting, hindsight bias is the cardinal sin.” - hindsight bias not only makes past surprises seem
less surprising, it also fosters belief that the future is more predictable
than it is
Chapter 12: What’s Next?
p. 251: “On the one hand, the hindsight-tainted
analyses that dominate commentary major events are a dead end. . . . . On the
other hand, our expectations of the future are derived from our mental models
of how the world works, and every event is an opportunity to learn and improve
those models.”; the problems is that “effective
learning from experience can’t happen without clear feedback, and you can’t
have clear feedback unless your forecasts are unambiguous and scoreable.”
p. 252: “Vague expressions about indefinite futures are not helpful. Fuzzy
thinking can never be proven wrong. . . . . Forecast, measure, revise: it is
the surest path to seeing better.” - if people see that, serious change
will begin; “Consumers of forecasting
will stop being gulled by pundits with good stories and start asking pundits
how their past predictions fared - and reject answers that consist of nothing
but anecdotes and credentials. And forecasters will realize . . . . that these
higher expectations will ultimately benefit them, because it is only with the
clear feedback that comes with rigorous testing that they can improve their
foresight.”
p. 252-253: “It could be huge - an “evidence-based
forecasting” revolution similar to the “evidence-based medicine” revolution,
with consequences every bit as significant.”
p. 253: nothing is certain: “Or nothing may change. . . . . things may
go either way.”; whether the future will be the “stagnant status quo” or
change “will be decided by the people whom political scientists call the “attentive
public. I’m modestly optimistic.”
p. 254-256: one can argue
that the only goal of forecasts is to be accurate but in practice, there are
multiple goals - in politics the key question is - Who does what to whom? -
people lie because self and tribe matter and in the mind of a partisan (Dick
Morris predicting a Romney landslide victory just before he lost is the example
Tetlock used - maybe he lied about lying) lying to defend self or tribe is
justified because partisans want to be the who doing whatever to the whom; “If forecasting can be co-opted to advance
their interests, it will be.” - but on the other hand, the medical
community resisted efforts to make medicine scientific but over time
persistence and effort paid off - entrenched interests simply have to be
overcome
p. 257: “Evidence-based policy is a movement modeled on evidence-based medicine,
with the goal of subjecting government policies to rigorous analysis so that
legislators will actually know - not merely think they know - whether policies
do what they are supposed to do.”; “.
. . . there is plenty of evidence that rigorous analysis has made a real
difference in government policy.”; analogies exist in philanthropy (Gates
Foundation) and sports - evidence is used to feed success and curtail failure
p. 262-263: “What matters is the big question, but the
big question can’t be scored.”, so ask a bunch of relevant small questions
- it’s like pointillism painting - each dot means little but thousands of dots
create a picture; clusters of little questions will be tested to see if that
technique can shed light on big questions
p. 264-265: elements of good
judgment include foresight and moral judgment, which can’t be run through an
algorithm; asking the right questions may not be the province of
superforecasters - Hedgehogs often seem to come up with the right questions -
the two mindsets needed for excellence may be different
p.266: the Holy Grail of my
research: “. . . . using forecasting tournaments to depolarize unnecessarily
polarized policy debates and make us collectively smarter.”
p. 269: adversarial but
constructive collaboration requires good faith; “Sadly, in noisy public arenas, strident voices dominate debates, and
they have zero interest in adversarial collaboration. . . . But there are less
voluble and more reasonable voices. . . . . let them design clear tests of
their beliefs. . . . . When the results run against their beliefs, some will
try to rationalize away the facts, but they will pay a reputational price. . .
. . All we have to do is get serious about keeping score.”