Monday, December 28, 2015

Superforecasting: Book comments and quotes

Philip E. Tetlock and Dan Gardner 
Superforecasting: The Art and Science of Prediction
Crown Books, September 2015
Quotes and comments on the book

Chapter 1: An optimistic skeptic
p. 3: regarding expert opinions, there is usually no accurate measurement of how good they are, there are “just endless opinions - and opinions on opinions. And that is business as usual.”; the media routinely delivers, or corporations routinely pay for, opinions that may be accurate, worthless or in between and everyone makes decisions on that basis

p. 5: talking head talent is skill in telling a compelling story, which is sufficient for success; their track record isn’t irrelevant - most of them are about as good as random guessing; predictions are time-sensitive - 1-year predictions tend to beat guessing more than 5- or 10-year projections

p. 8-10: there are limits on what is predictable → in nonlinear systems, e.g., weather patterns, a small initial condition change can lead to huge effects (chaos theory); we cannot see very far into the future

p. 13-14: predictability and unpredictability coexist; a false dichotomy is saying the weather is unpredictable - it is usually relatively predictable 1-3 days out, but at days 4-7 accuracy usually declines to near-random; weather forecasters are slowly getting better because they are in an endless forecast-measure-revise loop; prediction consumers, e.g., governments, businesses and regular people, don’t demand evidence of accuracy, so it isn’t available, and that means no revision, which means no improvement

p. 15: Bill Gate’s observation: surprisingly often a clear goal isn’t specified so it is impossible to drive progress toward the goal; that is true in forecasting; some forecasts are meant to (1) entertain, (2) advance a political agenda, or (3) reassure the audience their beliefs are correct and the future will unfold as expected (this kind is popular with political partisans)

p. 16: the lack of rigor in forecasting is a huge opportunity; to seize it (i) set the goal of accuracy and (ii) measure success and failure

p. 18: the Good Judgment Project found two things, (1) foresight is real and some people have it and (2) it isn’t strictly a talent from birth - (i) it boils down to how people think, gather information and update beliefs and (ii) it can be learned and improved

p. 21: a 1954 book showed that analysis of 20 studies showed that algorithms based on objective indicators were better predictors than well-informed experts; more than 200 later studies have confirmed that and the conclusion is simple - if you have a well-validate statistical algorithm, use it

p. 22: machines may never be able to beat talented humans, so dismissing human judgment as just subjective goes too far; maybe the best that can be done will come from human-machine teams, e.g., Garry Kasparov and Deep Blue together against a machine or a human

p. 23: quoting David Ferrucci, IBM’s Watson’s chief engineer is optimistic: ““I think it’s going to get stranger and stranger” for people to listen to the advice of experts whose views are informed only by their subjective judgment.”; Tetlock: “. . . . we will need to blend computer-based forecasting and subjective judgment in the future. So it’s time to get serious about both. 

Chapter 2: Illusions of knowledge
p. 25: regarding a medical diagnosis error: “We all been too quick to make up our minds and too slow to change them. And if we don’t examine how we make these mistakes, we will keep making them. This stagnation can go on for years. Or a lifetime. It can even last centuries, as the long and wretched history of medicine illustrates.

p. 30: “It was the absence of doubt - and scientific rigor - that made medicine unscientific and caused it to stagnate for so long.”; it was an illusion of knowledge - if the patient died, he was too sick to be saved, but if he got better, the treatment worked - there was no controlled data to support those beliefs; physicians resisted the idea of randomized, controlled trials as proposed in 1921 for decades because they knew their subjective judgments revealed the truth

p. 35: on Khaneman’s fast system 1: “A defining feature of intuitive judgment is its insensitivity to the quality of the evidence on which the judgment is based. It has to be that way. System 1 can only do its job of delivering strong conclusions at lightning speed if it never pauses to wonder whether the evidence at hand is flawed or inadequate, or if there is better evidence elsewhere.” - context - instantly running away from a Paleolithic shadow that might be a lion; Khaneman calls these tacit assumptions WYSIATI; system 1 judgments take less than 1 sec. - there’s no time to think about things; regarding coherence: “. . . . we are creative confabulators hardwired to invent stories that impose coherence on the world.

p. 38-39: confirmation bias: (i) seeking evidence to support the 1st plausible explanation, (ii) rarely seeking contradictory evidence and (iii) being a motivated skeptic in the face of contrary evidence and finding even weak or no reasons to denigrate contradictory evidence or reject it entirely, e.g., a doctor’s belief that a quack medical treatment works for all but the incurable is taken as proof that it works for everyone except the incurably ill; that arrogant, self-deluded mind set kept medicine in the dark ages for millennia and people suffered accordingly

p. 40: attribute substitution, availability heuristic or bait and switch: one question may be difficult or unanswerable w/o more info, so the unconscious System 1 substitutes another, easier, question and the easy question’s answer is the same as the hard question’s answer, even when it is wrong; CLIMATE CHANGE EXAMPLE: people who cannot figure out climate change on their own substitute what most climate scientists believe for their own belief - it can be wrong

p. 41-42: “The instant we wake up and look past the tip of our nose, sights and sounds flow into the brain and System 1 is engaged. This system is subjective, unique to each of us.”; cognition is a matter of blending inputs from System 1 and 2 - in some people, System 1 has more dominance than in others; it is a false dichotomy to see it as System 1 or System 2 operating alone; pattern recognition: System 1 alone can make very good or bad snap judgments and the person may not know why - bad snap judgment or false positive = seeing the Virgin Mary in burnt toast (therefore, slowing down to double check intuitions is a good idea)

p. 44: tip of the nose perspective is why doctors did not doubt their own beliefs for thousands of years

Chapter 3: Keeping Score
p. 48: it is not unusual that a forecast that may seem dead right or wrong really cannot be “conclusively judged right or wrong”; details of a forecast may be absent and the forecast can’t be scored, e.g., no time frames, geographic locations, reference points, definition of success or failure, definition of terms, a specified probability of events (e.g., 68% chance of X) or lack thereof or many comparison forecasts to assess the predictability of what is being forecasted; p. 53: “. . . . vague verbiage is more the rule than the exception.”; p. 55: security experts asked what the term “serious possibility” meant in a 1951 National Intelligence Estimate → one said it meant 80 to 20 (4 times more likely than not), another said it meant 20 to 80 and others said it was in between those two extremes

p. 50-52: national security experts had views split along liberal and conservative lines about the Soviet Union and future relations; they were all wrong and Gorbachev came to power and de-escalated nuclear and war tensions; after the fact, all the experts claimed they could see it coming all along; “But the train of history hit a curve, and as Karl Marx once quipped, the intellectuals fall off.”; the experts were smart and well-informed, but they were just misled by System 1’s subjectivity (tip of the nose perspective)

p.58-59: the U.S. intelligence community resisted putting definitions and specified probabilities in their forecasts until finally, 10 years after the WMD fiasco with Saddam Hussein, the case for precision was so overwhelming that they changed; “But hopelessly vague language is still so common, particularly in the media, that we rarely notice how vacuous it is. It just slips by.

p. 60-62: calibration: perfect calibration = X% chance of an event when past forecasts have always been “there is a X% chance” of the event, e.g., rainfall; calibration requires many forecasts for the assessment and is thus impractical for rare events, e.g., presidential elections; underconfidence = prediction is X% chance, but reality is a larger X+Y% chance; overconfidence = prediction is X% chance, but reality is a smaller X-Y% chance

p. 62-66: the two facets of good judgment are captured by calibration and resolution; resolution: high resolution occurs when predictions of low < ~ 20% or high > ~80% probability events are accurately predicted; accurately predicting rare events gets more weight than accurately predicting more common events; a low Brier score is best, 0.0 is perfect, 0.5 is random guessing and 2.0 is getting all or none, or yes or no, predictions wrong 100% of the time; however a score of 0.2 in one circumstance, e.g., weather prediction in Phoenix, AZ looks bad, while a score of 0.2 in Springfield MO is great because the weather there is far less predictable than in Phoenix; apples-to-apples comparisons are necessary, but it is very hard to find that kind of data - it usually doesn’t exist

p. 68: In EPJ, the bottom line was that some experts were marginally better than random guessing - the common characteristic was how they thought, not their ideology, Ph.D. or not, or access to classified information; the typical expert was about as good as random guessing and their thinking was ideological; “They sought to squeeze complex problems into the preferred cause-effect templates and treated what did not fit as irrelevant distractions. Allergic to wishy-washy answers, they kept pushing their analyses to the limit (and then some), using terms like “furthermore” and “moreover” when piling up reasons why they were right and others were wrong. As a result, they were confident to declare things “impossible” or “certain.” Committed to their conclusions, they were reluctant to change their minds even when their predictions clearly failed. They would tell us, “Just wait.”

p. 69: “The other group consisted of more pragmatic experts who drew on many analytical tools, with the choice of tool hinging on the particular problem they faced. . . . . They talked about possibilities and probabilities, not certainties.

p. 69: “The fox knows many things but the hedgehog knows one big thing. . . . . Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn’t. . . . . How did hedgehogs manage to do slightly worse than random guessing?”; hedgehog example is CNBC’s Larry Kudlow and his supply side economics Big Idea in the face of the 2007 recession
p. 70-72: on Kudlow: “Think of that Big Idea as a pair of glasses that the hedgehog never takes off. . . . And, they aren’t ordinary glasses. They are green-tinted glasses . . . . Everywhere you look, you see green, whether it’s there or not. . . . . So the hedgehog’s one Big Idea doesn’t improve his foresight. It distorts it.”; more information helps increase hedgehog confidence, not accuracy; “Not that being wrong hurt Kudlow’s career. In January 2009, with the American economy in a crisis worse than any since the Great Depression, Kudlow’s new show, The Kudlow Report, premiered on CNBC. That too is consistent with the EPJ data, which revealed an inverse correlation between fame and accuracy: the more famous an expert was, the less accurate he was.”; “As anyone who has done media training knows, the first rule is keep it simple, stupid. . . . . People tend to find uncertainty disturbing and “maybe” underscores uncertainty with a bright red crayon. . . . . The simplicity and confidence of the hedgehog impairs foresight, but it calms nerves - which is good for the careers of hedgehogs. . . . Foxes don’t fare so well in the media. . . . This aggregation of many perspectives is bad TV.

p. 73: an individual who does a one-off accurate guess is different from people who do it consistently; consistency is based on aggregation, which is the recognition that useful info is widely dispersed and each bit needs a separate weighting for importance and relevance

p 74: on information aggregation: “Aggregating the judgments of people who know nothing produces a lot of nothing.”; the bigger the collective pool of accurate information, the better the prediction or assessment; Foxes aggregate, but Hedgehogs don’t

p. 76-77: aggregation: looking at a problem from one perspective, e.g., pure logic can lead to an incorrect answer; multiple perspectives are needed; using both logic and psycho-logic (psychology or human cognition) helps; some people are lazy and don’t think, some apply logic to some degree and then stop, while others pursue logic to its final conclusion → aggregate all of those inputs to arrive at the best answer; “Foxes aggregate perspectives.

P 77-78: on human cognition - we don’t aggregate perspectives naturally: “The tip-of-your nose perspective insists that it sees reality objectively and correctly, so there is no need to consult other perspectives.

p. 79-80: on perspective aggregation: “Stepping outside ourselves and really getting a different view of reality is a struggle. But Foxes are likelier to give it a try.”; people’s temperament fall along a spectrum from the rare pure Foxes to the rare pure Hedgehogs; “And our thinking habits are not immutable. Sometimes they evolve without out awareness of the change. But we can also, with effort, choose to shift gears from one mode to another.

Chapter 4: Superforecasters
p. 84-85: the U.S. intelligence community (IC) is, like every huge bureaucracy (about 100,000 people, about $50 billion budget), very change-resistant - they saw and acknowledged their colossal failure to predict the Iranian revolution, but did little or nothing to address their dismal capacity to predict situations and future events; the WMD-Saddam Hussein disaster 22 years later finally inflicted a big enough shock to get the IC to seriously introspect

p 88 (book review comment?): my IARPA work isn’t as exotic as DARPA, but it can be just as important: that understates the case → it is more important

p. 89: humans “will never be able to forecast turning points in the lives of individuals or nations several years into the future - and heroic searches for superforecasters won’t change that.”; the approach: “Quit pretending you know things you don’t and start running experiments.

p. 90-93: the shocker: although the detailed result is classified, Good Judgment Project (GJP)) volunteers who passed screening and used simple algorithms but without access to classified information beat government intelligence analysts with access to classified information; one contestant (a retired computer programmer) had a Breier score of 0.22, 5th highest among 2,800 GJP participants and then in a later competition among the best forecasters, his score increased to 0.14, top among the initial group of 2,800 → he beat the commodities futures markets by 40% and the “wisdom of the crowd” control group by 60%

p. 94-95: the best forecasters got things right at 300 days out more than regular forecasters looking out 100 days and that improved over the 4-year GJP experiment: “. . . . these superforecasters are amateurs forecasting global events in their spare time with whatever information they can dig up. Yet they somehow managed to set the performance bar high enough that even the professionals have struggled to get over it, let alone clear it with enough room to justify their offices, salaries and pensions.

p. 96 (book review comment?): “And yet, IARPA did just that: it put the intelligence community’s mission ahead of the people inside the intelligence community - at least ahead of those insiders who didn’t want to rock the bureaucratic boat.

p. 97-98: “But it’s easy to misinterpret randomness. We don’t have an intuitive feel for it. Randomness is invisible from the tip-of-your-nose perspective. We can see it only if we step outside of ourselves.”; people can be easily tricked into believing that they can predict entirely random outcomes, e.g., guessing coin tosses; “. . . . delusions of this sort are routine. Watch business news on television, where talking heads are often introduced with a reference to one of their forecasting references . . . . And yet many people takes these hollow claims seriously.

p. 99: “Most things in life involve skill and luck, in varying proportions.

p. 99-101: regression to the mean cannot be overlooked and is a necessary tool for testing the role of luck in performance → regression is slow for activities dominated by skill, e.g., forecasting, and fast for activities dominated by chance/randomness, e.g., coin tossing

p. 102-103: the key question is how did superforecasters hold up across the years? → in years 2 and 3, superforecasters were the opposite of regressing and they got better; sometimes causal connections are nonlinear and thus not predictable and some of that had to be present among the variables that affected what the forecasters were facing → there should be some regression unless an offsetting process is increasing forecasters’ performance; there is some regression - about 30% of superforecasters fall out of the top 2% each year but 70% stay in - individual year-to-year correlation is about 0.65, which is pretty high, i.e., about 1 in 3 → Q: Why are these people so good?

Chapter 5: Supersmart?
p. 114: Fermi-izing questions, breaking a question into relevant parts, allows better guesses, e.g., how many piano tuners are there in Chicago → guess total pop, total # pianos, time to do one piano, hours/year a tuner works → that technique usually helps increase accuracy a lot, even when none of the numbers are known; Fermi-izing tends to defuse the unconscious System 1’s tendency to bait & switch the question; EXAMPLE: would testing of Arafat’s body 6 years after his death reveal the presence of Polonium, which is allegedly what killed him? → Q1 - can you even detect Po 6 years later? Q2: if Po is still detectable, how could it have happened, e.g., Israel, Palestinian enemies before or after his death → for this question the outside view, what % of exhumed bodies are found to be poisoned is hard to (i) identify and (ii) find the answer to, but identifying it is most important, i.e., it’s not certain (< 100%, say 80%), but it has to be more that trivial evidence otherwise authorities would not allow his body to be exhumed (> 20%) → use the 20-80% halfway point of 50% as the outside view, then adjust probability up or down based on research and the inside or intuitive System 1 view

p.118: superforecasters look at questions 1st from Khaneman’s “outside view”, i.e., the statistical or historical base rate or norm (the anchor) and then 2nd use the inside view to adjust probabilities up or down → System 1 generally goes straight to the comfortable but often wrong inside view and ignores the outside view; will there be a Vietnam-China border clash in the nest year starts the 1st (outside) view that asks how many clashes there have been over time, e.g., once every 5 years, and then merged in the 2nd view of current Vietnam-China politics to adjust the baseline probability up or down

p. 120: the outside view has to come first; “And it’s astonishingly easy to settle on a bad anchor.”; good anchors are easier to find from the outside view than from the inside

p. 123-124: some superforecasters kept explaining in the GJP online forum how they approached problems, what their thinking was and asking for criticisms, i.e., they were looking for other perspectives; simply asking if a judgment is wrong tends to lead to improvement in the first judgment; “The sophisticated forecaster knows about confirmation bias and will seek out evidence that cuts both ways.

p. 126: “A brilliant puzzle solver may have the raw material for forecasting, but if he also doesn’t have an appetite for questioning basic, emotionally-charged beliefs he will often be at a disadvantage relative to a less intelligent person who has a greater capacity for self-critical thinking.

p. 127: “For superforecasters, beliefs are hypothesis to be tested, not treasures to be guarded.

Chapter 6: Superquants?
p. 128-129: most superforecasters are good at math, but mostly they rely on subjective judgment: one super said this: “It’s all, you know, balancing, finding relevant information and deciding relevant is this really?”; it’s not math skill that counts most - its nuanced subjective judgment

p. 138-140: we crave certainty and that’s why Hedgehogs and their confident yes or no answers on TV are far more popular and comforting than Foxes with their discomforting “on the one hand . . . but on the other” style; people equate confidence with competence; “This sort of thinking goes a long way to explaining why so many people have a poor grasp of probability. . . . The deeply counterintuitive nature of statistics explains why even very sophisticated people often make elementary mistakes.” A forecast of a 70% chance of X happening means that there is a 30% chance it won’t - that fact is lost on most people → most people translate an 80% of X to mean X will happen and that just ain’t so; only when probabilities are closer to even, maybe about 65:35 to 34:65 (p. 144), does the translation for most people become “maybe” X will happen, which is the intuitively uncomfortable translation of uncertainty associated with most everything

p. 143: superforecasters tend to be probabilistic thinkers, e.g., Treasury secy Robert Rubin; epistemic uncertainty describes something unknown but theoretically knowable, while aleatory uncertainty is both unknown and unknowable

p. 145-146: superforecasters who use more granularity, a 20, 21 or 22% chance of X tended to be more accurate than those who used 5% increments and they tended to be more accurate than those who used 10% increments, e.g., 20%, 30% or 40%; when estimates were rounded to the nearest 5% or 10%, the granular best superforecasters fell into line with all the rest, i.e., there was real precision in those more granular 1% increment predictions

p. 148-149: “Science doesn’t tackle “why” questions about the purpose of life. It sticks to “how” questions that focus on causation and probabilities.”; “Thus, probabilistic thinking and divine-order thinking are in tension. Like oil and water, chance and fate do not mix. And to the extent we allow our thoughts to move in the direction of fate, we undermine our ability to think probabilistically. Most people tend to prefer fate.

p. 150: the sheer improbability of something that does happen, you meet and marry your spouse, is often attributed to fate or God’s will, not the understanding that sooner or later many/most people get married to someone at some point in their loves; the following psycho-logic is “incoherent”, i.e., not logic: (1) the chance of meeting the love of my life was tiny, (2) it happened anyway, (3) therefore it was meant to be and (4) therefore, the probability it would happen was 100%

p. 152: scoring for tendency to accept or reject fate and accept probabilities instead, average Americans are mixed or about 50:50, undergrads somewhat more biased toward probabilities and superforecasters are the most grounded in probabilities, while rejecting fate as an explanation; the more inclined a forecaster is to believe things are destined or fate, the less accurate their forecasts were, while probability-oriented forecasters tended to have the highest accuracy → the correlation was significant

Chapter 7: Supernewsjunkies?
p. 154-155: based on news flowing in, superforecasters tended to update their predictions and that tended to improve accuracy; it isn’t just a matter of following the news and changing output from sufficient new input - their initial forecasts were 50% more accurate that regular forecasters

p. 160: belief perseverance = people “rationalizing like crazy to avoid acknowledging new information that upsets their settled beliefs.” → extreme obstinacy, e.g., the fact that something someone predicted didn’t happen is taken as evidence that it will happen

p. 161-163: on underreacting to new information: “Social psychologists have long known that getting people to publicly commit to a belief is a great way to freeze it in place, making it resistant to change. The stronger the commitment, the greater the resistance.”; perceptions are a matter of our “identity”; “. . . . people’s  views on gun control often correlate with their views on climate change, even though the two issues have no logical connection to each other. Psycho-logic trumps logic.”; “. . . . superforecasters may have a surprising advantage: they’re not experts or professionals, so they have little ego invested in each forecast.”; consider “career CIA analysts or acclaimed pundits with their reputations on the line.

p. 164: on overreacting to new information: dilution effect = irrelevant or noise information can and often does change perceptions of probability and that leads to mistakes; frequent forecast updates based on small “units of doubt” (small increments) and that seems to tend to minimize overreacting and underreacting; balancing new information with the info that drive the original or earlier updates captures the value of all the information

p. 170: Baye’s theorem: new/updated belief/forecast = prior belief x diagnostic value of the new information; most superforecasters intuitively understand Baye’s theorem, but can’t write the equation down nor do they actually use it, instead they use the concept and weigh updates based on the value of new information

Chapter 8: Perpetual Beta
p. 174-175: two basic mindsets - the growth mindset is that you can learn and grow through hard work; the fixed mindset holds that you have what you were born with and that innate talents can be revealed but not created or developed, e.g., fixed mindsetters say things like, e.g., “I’m bad at math”, and it becomes a self-fulfilling prophecy; fixed mindset children given harder puzzles give up and lose interest, while growth mindset kids loved the challenge because for them, learning was a priority

p. 178: consistently inconsistent - John Maynard Keynes: engaged in an endless cycle of try, fail, analyze, adjust, try again; he retired wealthy from his investing, despite massive losses from the great depression and other personal blunders; skills improve with practice

p. 181-183: lack of prompt feedback is necessary for improvement, but it is usually lacking - experience alone doesn’t compensate - experienced police gain confidence that they are good at spotting liars, but it isn’t true because they don’t improve with time; most forecasters get little or no feedback because (1) their language is ambiguous and their forecasts are thus not precise enough to evaluate - self-delusion is a real concern and (2) a long time lag between forecast and time to get feedback on success or failure - with time a person forgets the details of their own forecasts and hindsight bias distorts memory, which makes it worse; vague language is elastic and people read into it what they want; hindsight bias = knowing the outcome of an event and that distorts our perception of what we thought we knew before the outcome; experts succumb to it all the time, e.g., prediction of loss of communist power monopoly in the Soviet Union before it disintegrated in 1991 and after it happened → recall was 31% higher than their original estimate

p. 190: “Superforecasters are perpetual beta.” - they have the growth mind set

p. 191: list of superforecaster traits

Chapter 9: Superteams
p. 201: success can lead to mental habits that undermine the mental habits that led to success in the first place; on the other hand, properly functioning teams can generate dragonfly eye perspectives, which can improve forecasting

p. 208-209: givers on teams are not chumps - they tend to make the whole team perform better; it is complex and it will take time to work out the psychology of groups - replicating this won’t be easy in then real world; “diversity trumps ability” may be true due to the different perspectives a team can generate or, maybe it’s a false dichotomy and a shrewd mix of ability and diversity is the key to optimum performance

Chapter 10: The Leader’s Dilemma
p. 229-230: Tetlock uses the Wehrmacht as an example of how leadership and judgment can be effectively combined, even though it served an evil end → the points being that (i) even evil can operate intelligently and creatively so therefore don’t underestimate your opponent and (ii) seeing something as evil and wanting to learn from it presents no logical contradiction but only a psycho-logical tension that superforecasters overcome because they will learn from anyone or anything that has information or lessons of value

Chapter 11: Are They really So Super?
p. 232-233: in a 2014 interview Gen. Michael Flynn, Head of DIA (DoD’s equivalent of the CIA; 17,000 employees) said “I think we’re in a period of prolonged societal conflict that is pretty unprecedented.” but googling the phrase “global conflict trends” says otherwise; Flynn, like Peggy Noonan and her partisan reading of political events, suffered from the mother of all cognitive illusions, WYSIATI → every day for three hours, Flynn saw nothing but reports of conflicts and bad news; what is important is the fact that Flynn, a highly accomplished and intelligent operative fell for the most obvious illusion there is → even when we know something is a System 1 cognitive illusion, we sometimes cannot shut it off and see unbiased reality, e.g., Müller-Lyer optical illusion (two equal lines, one with arrow ends pointing out and one with ends pointing in - the in-pointing arrow line always looks longer, even when you know it isn’t)

p. 234-237: “. . . . dedicated people can inoculate themselves to some degree against certain cognitive illusions.”; scope insensitivity is a major illusion of particular importance to forecasters - it is another bait & switch bias or illusion where a hard question is unconsciously substituted with a simpler question, e.g., the average amount groups of people would be willing to pay to avoid 2,000, 20,000 or 200,000 birds drowning in oil ponds was the same for each group, $80 → the problem’s scope recedes into the background so much that it becomes irrelevant; the scope insensitivity bias or illusion (Tetlock seems to use the terms interchangeably) is directly relevant to geopolitical problems; surprisingly, superforecasters were less influenced by scope insensitivity than average forecasters - scope sensitivity wasn’t perfect, but it was good (better than Khaneman guessed it would be); Tetlock’s guess → superforecasters were skilled and persistent in making System 2 corrections of System 1 judgments, e.g., by stepping into the outside view, which dampens System 1 bias and/or ingrains the technique to the point that it is “second nature” for System 1

p. 237-238: CRITICISM: how long can superforecasters defy psychological gravity?; maybe a long time - one developed software designed to correct System 1 bias in favor of the like-minded and that helped lighten the heavy cognitive load of forecasting; Nassim Taleb’s Black Swan criticism of all of this is that (i) rare events, and only rare events, change the course of history and (ii) there just aren’t enough occurrences to judge calibration because so few events are both rare and impactful on history; maybe superforecasters can spot a Black Sawn and maybe they can’t - the GJP wasn’t designed to ask that question

p. 240-241, 244: REBUTTAL OF CRITICISM: the flow of history flows from both Black Swan events and from incremental changes; if only Black Swans counted, the GJP would be useful only for short-term projections and with limited impact on the flow of events over long time frames; and, if time frames are drawn out to encompass a Black Swan, e.g., the one-day storming of the Bastille on July 14, 1789 vs. that day plus the ensuing 10 years of the French revolution, then such events are not so unpredictable - what’s the definition of a Black Swan?; other than the obvious, e.g., there will be conflicts, predictions 10 years out are impossible because the system is nonlinear

p. 245: “Knowing what we don’t know is better than thinking we know what we don’t.”; “Khaneman and other pioneers of modern psychology have revealed that our minds crave certainty and when they don’t find it, they impose it.”; referring to experts revisionist response the unpredicted rise of Gorbachev: “In forecasting, hindsight bias is the cardinal sin.” - hindsight bias not only makes past surprises seem less surprising, it also fosters belief that the future is more predictable than it is

Chapter 12: What’s Next?
p. 251: “On the one hand, the hindsight-tainted analyses that dominate commentary major events are a dead end. . . . . On the other hand, our expectations of the future are derived from our mental models of how the world works, and every event is an opportunity to learn and improve those models.”; the problems is that “effective learning from experience can’t happen without clear feedback, and you can’t have clear feedback unless your forecasts are unambiguous and scoreable.

p. 252: “Vague expressions about indefinite futures are not helpful. Fuzzy thinking can never be proven wrong. . . . . Forecast, measure, revise: it is the surest path to seeing better.” - if people see that, serious change will begin; “Consumers of forecasting will stop being gulled by pundits with good stories and start asking pundits how their past predictions fared - and reject answers that consist of nothing but anecdotes and credentials. And forecasters will realize . . . . that these higher expectations will ultimately benefit them, because it is only with the clear feedback that comes with rigorous testing that they can improve their foresight.

p. 252-253: “It could be huge - an “evidence-based forecasting” revolution similar to the “evidence-based medicine” revolution, with consequences every bit as significant.

p. 253: nothing is certain: “Or nothing may change. . . . . things may go either way.”; whether the future will be the “stagnant status quo” or change “will be decided by the people whom political scientists call the “attentive public. I’m modestly optimistic.

p. 254-256: one can argue that the only goal of forecasts is to be accurate but in practice, there are multiple goals - in politics the key question is - Who does what to whom? - people lie because self and tribe matter and in the mind of a partisan (Dick Morris predicting a Romney landslide victory just before he lost is the example Tetlock used - maybe he lied about lying) lying to defend self or tribe is justified because partisans want to be the who doing whatever to the whom; “If forecasting can be co-opted to advance their interests, it will be.” - but on the other hand, the medical community resisted efforts to make medicine scientific but over time persistence and effort paid off - entrenched interests simply have to be overcome

p. 257: “Evidence-based policy is a movement modeled on evidence-based medicine, with the goal of subjecting government policies to rigorous analysis so that legislators will actually know - not merely think they know - whether policies do what they are supposed to do.”; “. . . . there is plenty of evidence that rigorous analysis has made a real difference in government policy.”; analogies exist in philanthropy (Gates Foundation) and sports - evidence is used to feed success and curtail failure

p. 262-263: “What matters is the big question, but the big question can’t be scored.”, so ask a bunch of relevant small questions - it’s like pointillism painting - each dot means little but thousands of dots create a picture; clusters of little questions will be tested to see if that technique can shed light on big questions

p. 264-265: elements of good judgment include foresight and moral judgment, which can’t be run through an algorithm; asking the right questions may not be the province of superforecasters - Hedgehogs often seem to come up with the right questions - the two mindsets needed for excellence may be different

p.266: the Holy Grail of my research: “. . . . using forecasting tournaments to depolarize unnecessarily polarized policy debates and make us collectively smarter.”

p. 269: adversarial but constructive collaboration requires good faith; “Sadly, in noisy public arenas, strident voices dominate debates, and they have zero interest in adversarial collaboration. . . . But there are less voluble and more reasonable voices. . . . . let them design clear tests of their beliefs. . . . . When the results run against their beliefs, some will try to rationalize away the facts, but they will pay a reputational price. . . . . All we have to do is get serious about keeping score.

Invitation to participate at the GJP website:

No comments:

Post a Comment