College football bowl confidence pools

Introduction

For most of the last decade, I have followed an annual NCAA football bowl pool.  An entry into the pool consists of picking a winner of each of n bowl games, and assigning each pick a unique “confidence” point value from 1 to n.  For each game picked correctly, you win the assigned number of confidence points; the entry with the most points wins, uh, bragging rights.

Certainly there is some opportunity for strategy in pools like this.  For example, knowledge of point spreads helps: one simple strategy is to pick the favored team in every game, with confidence points assigned in order of point spread.  In fact, this strategy maximizes the expected (i.e., average) number of points won among all possible entries.

However, expectation isn’t everything: the objective here is not to maximize average number of points, it’s to maximize the probability of winning, i.e., scoring more points than all other submitted entries.  The objective of this post is to describe some of the mathematics involved in trying to solve this problem.  The techniques described here have won a lot of bragging rights, but there are still a lot of unanswered questions.  (And unfortunately, this basic pool structure may soon be obsolete, with the introduction of the 4-team playoff last year.)

Probabilities from point spreads

First, let’s assume that we actually know the “true” probabilities of outcomes of every bowl game in a season.  We don’t, really… but we do know the point spread for every game.  For example, the following figure shows the spreads for the bowl games over the past 9 years, shown in ascending order of spread for each season (not chronologically):

Point spreads for college football bowl games 2006-2014. The 38 games in 2014 exclude the final championship game from the 4-team playoff.

Point spreads for college football bowl games 2006-2014. The 38 games in 2014 exclude the final championship game from the 4-team playoff.

Intuitively, we expect the probability for a “pick’em” (i.e., spread near zero) to be about 1/2 either way, and the probability of a two-touchdown favorite winning to be much closer to 1.  There is an interesting analysis by Hal Stern (see reference at the end of this post) suggesting that a reasonable model of the actual margin of victory in a football game is a normal distribution with mean equal to the point spread, and standard deviation of slightly less than 14 points.  We can use this model to convert the above point spreads into corresponding probabilities of winning each game, as shown in the following figure:

Estimated probability of favored team winning each bowl game 2006-2014.

Estimated probability of favored team winning each bowl game 2006-2014.

(Aside: Although this model is certainly reasonable– using it has won pools in the past– it would be interesting to see some further analysis to validate or refine it.  Stern’s original analysis is nearly 30 years old, and focused on NFL games, not college football.)

Comparing two (or more) pool entries

Armed with these probabilities, what do we do with them?  As a first step, let’s suppose, temporarily, that there are just two entries in the entire pool.  What is the probability of one entry scoring more points than the other?  At first glance, even this simplified problem looks difficult: for n bowl games, there are 2^n possible outcomes to evaluate.  But it turns out we can “compare” two entries reasonably efficiently, using probability generating functions.

Let’s specify a pool entry as an ordered pair of n-vectors (\mathbf{q}, \mathbf{w}), where q_i in {0,1} indicates whether the underdog (0) or favorite (1) is picked to win game i, and 1 \leq w_i \leq n is the corresponding confidence point assignment.  Then, given a second entry (\mathbf{r}, \mathbf{v}), consider the following probability generating function:

g(x) = \prod\limits_{i=1}^n (p_i x^{q_i w_i-r_i v_i}+(1-p_i)x^{(1-q_i)w_i-(1-r_i)v_i})

where p_i is the assumed probability that the favored team wins game i.  Each term in the product corresponds to a game, and each summand corresponds to a possible outcome of that game.  Expanding this product as a polynomial in the formal variable x yields only O(n^2) non-zero coefficients, indicating the probability distribution of the overall difference in scores between the two entries.

(We are cheating a little by allowing negative exponents, but that’s not a problem here, since we are just using the generating function machinery for bookkeeping, and not actually evaluating g(x) for specific values of x.)

This provides a means of quickly computing the probability that one entry scores more points than another: just sum the coefficients of the positive powers of x in the corresponding g(x).  For example, suppose that (\mathbf{r}, \mathbf{v}) is the entry described above that maximizes expected score.  If we consider this to be a “representative” entry in a larger pool, then we can optimize our own entry by searching for one that maximizes the probability of beating (\mathbf{r}, \mathbf{v}).

Better representatives

This strategy is certainly not perfect.  The larger the number of entries in the actual pool, the more likely it is that someone submits an entry that wins the pool, despite deviating significantly from the “representative” entry.  Fortunately, we can attempt to account for the size of the pool by modifying the representative, expected score-maximizing entry slightly:

Suppose that instead of a single entry that picks the favorite for every game, there are 2^k entries that “buy” the k toughest games with the smallest point spreads.  There is one entry for each possible combination of outcomes of those k games… where each entry assigns the k largest confidence values to those toughest games.  For example, buying the 3 lowest-spread games requires 8 distinct entries, covering all possible outcomes of those games, and guaranteeing that exactly one of those entries will score all (n-2)+(n-1)+n points assigned to them.

The reason for choosing this form for the “family” of more powerful representative entries is that we can “compress” all 2^k entries into a single (\mathbf{r}, \mathbf{v}) pair, where the picks for the bought games have the form r_i = 1/2.  Using this convention, no efficiency is lost in evaluating candidate entries against this entire family of entries, by replacing the appearance of each r_i by the unit step function H[r_i]:

g(x) = \prod\limits_{i=1}^n (p_i x^{q_i w_i-H[r_i] v_i}+(1-p_i)x^{(1-q_i])w_i-H[1-r_i]v_i})

The following figure shows how well this could be expected to work, as a function of the number of low-spread games bought in the representative entry(s).  The y-axis indicates the maximum probability of beating all corresponding representative entries, using an entry found by a simple steepest-ascent search within the space of all n!2^n possible entries.

Estimated probability of winning pool against a "maximum expected points" entry with some number of "bought" games.

Estimated probability of winning pool against a “maximum expected points” entry with some number of “bought” games.

There are plenty of open questions here.  For example, I don’t see a nice way to efficiently compute the probability of an entry winning against more than one other entry (or against more than just the one family of similar entries as described above).  And even if there was such an efficient algorithm for optimizing against a large pool of entries, what is a reasonable way to model a large pool of entries?

Reference:

  • Stern, H., On the Probability of Winning a Football Game, The American Statistician, 45(3) August 1991, p. 179-183 [PDF]

Coin flip puzzle

There was a lot of unwarranted buzz this past week about the New England Patriots winning 19 of their past 25 coin tosses prior to each game, leading to semi-serious speculation of more foul play.  This episode is a great source of many interesting probability problems, none of which is the question asked by almost every related news article: “What is the probability that the Patriots would win at least 19 of their last 25 coin tosses purely by chance?”  (The answer is about 1/137, but the better answer is that this is the wrong question.)

I think one slightly better and more interesting question is: how often does this happen by chance?  That is, suppose that we flip a fair coin repeatedly, and after each flip we observe whether at least 19 of the most recent 25 flips came up heads (a success), or less than 19 were heads (a failure).  Over time, for what fraction of flips should we expect success?

Another related question: what is the expected number of games until the first such success?