17 · Research

Superforecasting

gathering

What separates the people who consistently call the future better than chance from everyone else. Gathering material on the Tetlock-Mellers programme, the IARPA Good Judgment Project, and the broader literature on calibration, base rates, and probabilistic reasoning.

started 2026-06-01forecastingjudgmentcalibrationtetlockdecision-making

The question driving this: what is actually transferable from the small number of people who keep producing better-than-chance forecasts on geopolitical, economic and scientific questions? Is it a method, a temperament, a set of training reps, or some combination, and how much of it is teachable?

Starting from Superforecasting: The Art and Science of Prediction by Tetlock and Gardner (already in the library, reading now), then radiating out into the IARPA tournament data, calibration literature, and the practitioner side (Good Judgment Inc., Metaculus, Manifold).

Resources

Foundational

Good Judgment Project (official) , the surviving spinout of the IARPA forecasting tournament that Tetlock's team won. Has public training material.
The Good Judgment Project (Wikipedia) , concise overview of the IARPA tournament, the methodology, and the people behind it. Useful starting point before reading the primary papers.
The Decision Lab: Philip Tetlock , thinker-profile of Tetlock, his career arc and the through-line from Expert Political Judgment to Superforecasting.

Papers

Mellers et al. (2015), "Identifying and Cultivating Superforecasters" (Stanford PDF) , the Good Judgment Project group's account of what the top forecasters were actually doing differently. The empirical core.
Katsagounos and Thomakos (2020), "Superforecasting reality check: Evidence from a small pool of experts and expedited identification" , European Journal of Operational Research. A reality check on the Tetlock claims using a smaller expert pool. Worth reading specifically as the contrarian counterweight to the IARPA-tournament narrative.
Tetlock, "Expert Political Judgment." , 2005, the precursor to Superforecasting that established the chimp-vs-hedgehog finding.
Brier 1950 on the scoring rule, the original Brier-score paper. Worth reading for the maths.

Adjacent reading

Kahneman, "Thinking, Fast and Slow" , the System 1 / System 2 framing is the substrate for a lot of the cognitive errors Tetlock documents.
Silver, "The Signal and the Noise" , narrower than Tetlock but stronger on the statistical machinery.
Duke, "Thinking in Bets" , Annie Duke on calibration and probability-thinking through the poker-player lens.

Working notes

The book frames superforecasters as people who treat forecasting as a skill, not a gift. The actionable claims so far:

Decompose the question into parts you can put base rates on.
Update incrementally on evidence, not in lurches.
Treat your own confidence as something to be calibrated by checking it against outcomes, not by feeling sure.

The Brier score is doing most of the work in the empirical claims of the book. Worth understanding the scoring rule properly before reading further so the cited differences in performance feel real instead of just "this number is smaller."

Open questions

How much of the superforecaster edge survives outside the IARPA-style question format (binary or short-range probabilistic)? Long-horizon, vague, or unfalsifiable questions are most of what matters in life. The book mostly dodges this.
Is "tournament forecasting" a separate skill from "useful judgment in your own life," or do the same reps transfer? The book implies transfer, the evidence is thinner.
What does the failure mode look like? People who get worse the more they forecast, or hit a calibration ceiling.

Toward the output

When this topic is ready, it produces a notebook essay (probably under the on-thinking-clearly thread or a new one for forecasting). Possibly a writeup if I do a small calibration exercise against my own predictions and score them honestly.