This is becoming an annual exercise. Two years ago, I wrote about the probability of picking a “perfect” NCAA tournament bracket. Last year, the topic was the impact of various systems for scoring brackets in office pools.
This year I just want to provide up-to-date historical data for anyone who might want to play with it, including all 32 seasons of the tournament in its current 64-team format, from 1985 to 2016.
(Before continuing, note that the 4 “play-in” games of the so-called “first” round are an abomination, and so I do not consider them here, focusing on the 63 games among the 64-team field.)
First, the data: the following 16×16 matrix indicates the number of regional games (i.e., prior to the Final Four) in which seed i beat seed j. Note that the round in which each game was played is implied by the seed match-up (e.g., seeds 1 and 16 play in the first round, etc.).
0 21 13 34 32 7 4 52 59 4 3 19 4 0 0 128 23 0 25 2 0 23 54 2 0 27 12 1 0 0 120 0 8 14 0 2 2 38 7 1 1 9 27 0 0 107 1 0 15 4 3 0 36 2 2 3 2 2 0 23 102 0 0 0 7 3 1 31 0 1 0 0 1 1 0 82 12 0 0 0 2 6 28 1 0 0 4 0 0 4 82 0 0 14 0 0 0 21 5 2 0 3 0 0 0 78 0 0 0 1 2 0 12 3 0 5 2 1 1 0 64 0 0 0 1 0 0 0 5 1 0 0 1 0 0 64 0 0 0 0 1 0 0 0 1 18 4 0 0 2 50 0 0 0 1 0 0 1 5 0 3 1 14 0 0 46 3 0 0 2 0 0 0 5 0 0 0 0 0 12 46 0 0 1 0 0 0 0 8 0 0 0 0 0 0 26 3 0 0 0 0 0 0 3 0 0 0 0 0 0 21 0 0 2 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The following matrix, in the same format, is for the Final Four games:
12 6 2 5 1 0 1 1 1 1 0 0 0 0 0 0 4 3 3 1 0 1 0 0 0 0 1 0 0 0 0 0 4 2 0 2 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Finally, the following matrix is for the championship games:
6 6 1 2 3 1 0 0 0 0 0 0 0 0 0 0 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
We can update some of the past analysis using this new data as well. For example, what is the probability of picking a “perfect” bracket, predicting all 63 games correctly? As before, Schwertman (see reference below) suggests a couple of simple-but-reasonable models of the probability of seed i beating seed j given by
where is a measure of the “strength” of seed i, and k is a scaling factor controlling the range of resulting probabilities, in this case chosen so that
, the expected value of the corresponding beta distribution.
One simple strength function is , which yields an overall probability of a perfect chalk bracket of about 1 in 188 billion. A slightly better historical fit is
where is the quantile function of the normal distribution, and
is the number of teams in Division I. In this case, the estimated probability of a perfect bracket is about 1 in 91 billion. In either case, a perfect bracket is far more likely– about 100 million times more likely– than the usually-quoted 1 in 9.2 quintillion figure that assumes all
outcomes are equally likely.
References:
-
- Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [PDF]