Historical probability of picking a perfect NCAA bracket 1985-2023

There is a lot of discussion of increasing parity in NCAA men’s basketball. For example, this year’s tournament saw another #16 seed upset a #1 seed in the first round, for only the second time in the history of the current 64-team tournament format going back to 1985. (As usual, I’m ignoring the nonsense of the four play-in games.) … But the first such first-round upset of a #1 seed was just 5 years ago in 2018.

Every year there is also discussion of the probability of picking a “perfect” tournament bracket– that is, what is the probability of correctly guessing the winners of all 63 games, from the first round to the championship? Estimating this probability is a convenient way to aggregate and weight the numerous upsets that happen each year (about 17.8 games on average) into a single number, that we can then use to compare the overall prior (un)likelihood of tournament outcomes across years.

All of the tournament history is available on GitHub, and this past article describes the details of methodology for modeling probabilities of individual game outcomes as a function of seed matchup. Here are the results updated to include this year’s tournament:

Probability of a perfect bracket, 1985-2023.

The constant black line at the bottom reflects the 1 in 2^{63}, or 1 in 9.2 quintillion exact probability of guessing any year’s outcome correctly, if you simply flip a fair coin to determine the winner of each game. The constant blue and red lines at the top indicate the estimated probability of a “chalk” bracket outcome, always picking the higher seeded team to win each match-up. (The blue and red colors reflect two different models of game probabilities as a function of difference in “strength” of each team; as discussed in the linked article, blue indicates a strength following a normal distribution density, and red indicates a simpler linear strength function.)

This year’s 2023 tournament made things particularly interesting, since we are now significantly “recalibrating” the single-game probability model (as a function of seed matchup) to nearly double the estimated probability of the most extreme 16-over-1 upset… and using that updated model to re-evaluate all past years.

In other words, despite deciding that a #16 upsetting a #1 seed is now, and has always been, twice as likely as we thought a year ago, this year’s tournament was still the most unlikely overall outcome ever– at least by the normal strength model (and the second-most unlikely by the linear strength model).