NCAA tournament brackets revisited

This is becoming an annual exercise.  Two years ago, I wrote about the probability of picking a “perfect” NCAA tournament bracket.  Last year, the topic was the impact of various systems for scoring brackets in office pools.

This year I just want to provide up-to-date historical data for anyone who might want to play with it, including all 32 seasons of the tournament in its current 64-team format, from 1985 to 2016.

(Before continuing, note that the 4 “play-in” games of the so-called “first” round are an abomination, and so I do not consider them here, focusing on the 63 games among the 64-team field.)

First, the data: the following 16×16 matrix indicates the number of regional games (i.e., prior to the Final Four) in which seed i beat seed j.  Note that the round in which each game was played is implied by the seed match-up (e.g., seeds 1 and 16 play in the first round, etc.).

   0  21  13  34  32   7   4  52  59   4   3  19   4   0   0 128
  23   0  25   2   0  23  54   2   0  27  12   1   0   0 120   0
   8  14   0   2   2  38   7   1   1   9  27   0   0 107   1   0
  15   4   3   0  36   2   2   3   2   2   0  23 102   0   0   0
   7   3   1  31   0   1   0   0   1   1   0  82  12   0   0   0
   2   6  28   1   0   0   4   0   0   4  82   0   0  14   0   0
   0  21   5   2   0   3   0   0   0  78   0   0   0   1   2   0
  12   3   0   5   2   1   1   0  64   0   0   0   1   0   0   0
   5   1   0   0   1   0   0  64   0   0   0   0   1   0   0   0
   1  18   4   0   0   2  50   0   0   0   1   0   0   1   5   0
   3   1  14   0   0  46   3   0   0   2   0   0   0   5   0   0
   0   0   0  12  46   0   0   1   0   0   0   0   8   0   0   0
   0   0   0  26   3   0   0   0   0   0   0   3   0   0   0   0
   0   0  21   0   0   2   0   0   0   0   0   0   0   0   0   0
   0   8   0   0   0   0   1   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

The following matrix, in the same format, is for the Final Four games:

  12   6   2   5   1   0   1   1   1   1   0   0   0   0   0   0
   4   3   3   1   0   1   0   0   0   0   1   0   0   0   0   0
   4   2   0   2   0   0   0   0   0   0   1   0   0   0   0   0
   1   0   0   1   1   0   0   0   0   0   0   0   0   0   0   0
   0   1   0   0   1   0   0   1   0   0   0   0   0   0   0   0
   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   2   0   0   0   0   0   0   0   0   1   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Finally, the following matrix is for the championship games:

   6   6   1   2   3   1   0   0   0   0   0   0   0   0   0   0
   2   0   3   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   2   1   0   0   0   0   1   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

We can update some of the past analysis using this new data as well.  For example, what is the probability of picking a “perfect” bracket, predicting all 63 games correctly?  As before, Schwertman (see reference below) suggests a couple of simple-but-reasonable models of the probability of seed i beating seed j given by

p_{i,j} = 1 - p_{j,i} = \frac{1}{2} + k(s_i - s_j)

where s_i is a measure of the “strength” of seed i, and k is a scaling factor controlling the range of resulting probabilities, in this case chosen so that p_{1,16}=129/130, the expected value of the corresponding beta distribution.

One simple strength function is s_i=-i, which yields an overall probability of a perfect chalk bracket of about 1 in 188 billion.  A slightly better historical fit is

s_i = \Phi^{-1}(1 - \frac{4i}{n}) 

where \Phi^{-1} is the quantile function of the normal distribution, and n=351 is the number of teams in Division I.  In this case, the estimated probability of a perfect bracket is about 1 in 91 billion.  In either case, a perfect bracket is far more likely– about 100 million times more likely– than the usually-quoted 1 in 9.2 quintillion figure that assumes all 2^{63} outcomes are equally likely.


    1. Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [PDF]
Posted in Uncategorized | Leave a comment

Analysis of (hexagonal) Battleship


I played a lot of Battleship when I was a kid.  It’s a simple game, but one with potentially very complex optimal playing strategy.  Recently I encountered a variant of the game that Milton Bradley released in 2008 (now owned by Hasbro), using a hexagonal grid instead of the usual square 10×10 grid.  The motivation for this post is to compare the original and updated versions, demonstrating just how unfair the new version of the game might be.

Original Battleship

In both versions of the game, two players each place their own ships on a grid of cells hidden from view of the other player.  Players then alternate turns guessing cell locations, trying to hit and “sink” the other player’s ships.  The following figure shows an example of one player’s deployment in the original game; there are 5 ships of various lengths: a carrier (5 cells), battleship (4 cells), submarine (3 cells), cruiser (3 cells), and destroyer (2 cells).

Original Battleship grid for a single player with example deployment of ships.

Original Battleship grid for a single player with example deployment of ships.

A natural question to ask is, in how many possible ways can a player deploy his ships?  This is a known problem that has been solved many times before… but usually with special-purpose code implementing a backtracking search enumerating individual deployments.

Instead, we can re-use the implementation of Knuth’s “Dancing Links” (DLX) algorithm by casting the problem as an instance of a generalized exact cover (more on the generalization shortly).  Recall that an exact cover problem is a matrix of 0s and 1s, with a solution consisting of a subset of rows of the matrix containing exactly one 1 in each column.

To count Battleship deployments, very similar to counting Kanoodle puzzle solutions, there are two “kinds” of columns in our matrix: one column for each of the 5 ships, and one column for each of the 100 cells in the 10×10 grid.  Each row of the matrix represents a possible placement of one of the ships, with a single 1 in the corresponding “ship” column, and additional 1s in the corresponding occupied “cell” columns.  The resulting matrix has 105 columns and 760 rows, corresponding to placing each of the carrier, battleship, submarine, cruiser, and destroyer in 120, 140, 160, 160, and 180 ways, respectively.

However, unlike the Kanoodle puzzle, we don’t want an exact cover here, since that would effectively require that our 5 ships cover all 100 cells of the grid!  Instead, we want a “generalized” cover, in which some of the columns of the matrix are “optional” (Knuth calls them “secondary”), and may be covered at most once instead of exactly once.  In this case, the 5 ship columns are required/primary (we must use all of the ships), and the 100 cell columns are optional/secondary (we don’t have to cover every cell, but the ships can’t overlap, either).

Putting this all together, the following Python code specifies the details of the original Battleship game:

  • The board, specified as the coordinates of each of the cells in the 10×10 grid.
  • The size and shape of the 5 ship pieces, each in a default position and orientation.
  • The 4 possible rotations (in 90-degree increments) of each piece.
import numpy as np

def matrix_powers(R, n):
    """Return [R^1, R^2, ..., R^n]."""
    result = []
    a = np.array(R)
    for k in range(n):
        a =
    return result

class Battleship:
    def __init__(self):
        self.board = {(x, y) for x in range(10) for y in range(10)}
        self.pieces = {'Carrier': {(x, 0) for x in range(5)},
                       'Battleship': {(x, 0) for x in range(4)},
                       'Submarine': {(x, 0) for x in range(3)},
                       'Cruiser': {(x, 0) for x in range(3)},
                       'Destroyer': {(x, 0) for x in range(2)}}
        self.rotations = matrix_powers(((0, -1), (1, 0)), 4)

Given such a specification of the details of the game, the following method constructs a sparse representation of the corresponding generalized exact cover matrix, by considering every possible ship piece, in every possible orientation, in every possible position on the board:

    def cover(self):
        """Return (pairs, optional_columns) for exact cover problem."""

        # Enumerate all possible placements/rotations of pieces.
        rows = set()
        for name, piece in self.pieces.items():
            for R in self.rotations:
                for offset in self.board:
                    occupied = {tuple( + offset) for x in piece}
                    if occupied <= self.board:
                        rows.add((name, tuple(sorted(occupied))))

        # Convert placements to (row,col) pairs and optional column indices.
        cols = dict(enumerate(self.pieces))
        cols.update(enumerate(self.board, len(self.pieces)))
        cols = {v: k for k, v in cols.items()}
        pairs = []
        for i, row in enumerate(rows):
            name, occupied = row
            pairs.append((i, cols[name]))
            for x in occupied:
                pairs.append((i, cols[x]))
        return (pairs, list(range(len(self.pieces), len(cols))))

Plugging this into the C++ implementation of DLX (all of this code is available in the usual location here), we find, about 15 minutes later, that there are over 30 billion ways– 30,093,975,536, to be exact– for either player to place his 5 ships on the board in the original Battleship game.

Hexagonal Battleship

In the 2008 version of the game, there are several changes, as shown in the figure below.  Most immediately obvious is that the grid is no longer square, but hexagonal.  Also, some grid cells are “islands” (shown in brown) on which ships cannot be placed.  There are still 5 ships (shown in gray), but they have changed shape somewhat, no longer confined to straight lines of cells.

Finally, and most importantly, the two players each deploy their ships on two different “halves” of the board, with the “Blue” player’s ships on the top half, and the “Green” player’s ships on the bottom half.  (The figure shows a typical view of the board from Blue’s perspective.)

Hexagonal Battleship grid for both players (Blue and Green), with islands (brown) and example deployment of Blue's ships (gray).

Hexagonal Battleship grid for both players (Blue and Green), with islands (brown) and example deployment of Blue’s ships (gray).

Closer inspection of the board shows that the Blue and Green halves are almost symmetric… but not quite.  Each half has 79 grid cells on which to deploy ships (I’m guessing this explains the missing cell at bottom center), and 4 of the 5 islands in each half are exactly opposite their counterparts in the other half… but not everything lines up exactly.  This strongly suggests that one half allows more possible ship deployments than the other (can you guess which just by looking?), which in turn suggests that that player has at least a slight advantage in the game.

Hexagonal coordinates

We can count possible ship deployments in the same way, using the same code, as in the original game described above.  The only catch is the hexagonal arrangement of grid cells.  To handle this, we just need a coordinate system for specifying cell locations that may at first seem unnecessarily complicated: let’s view our two-dimensional grid as being embedded in three dimensions.

Specifically, consider the points with integer coordinates in the plane x+y+z=0.  Imagine viewing this plane in two dimensions by looking down along the plane’s normal vector (1,1,1)^T toward the origin.  The points in this plane with integer coordinates form a triangular lattice; they are the centers of each hexagonal grid cell.  In the figure above, each hexagonal grid cell is shown with the coordinates of its center point, with the origin at the center of the board.

This is a handy representation, since these points form a vector space (more precisely, a module), where translations (i.e., moving ships from their “default” location to somewhere on the board) and rotations (i.e., orientations of ships) correspond to vector addition and matrix multiplication, respectively.

The following Python code uses these hexagonal coordinates to define the board, ship pieces, and 6 possible rotations for either the Blue (is_upper=True) or Green (is_upper=False) player in the 2008 version of Battleship:

class HexBattleship(Battleship):
    def __init__(self, is_upper=True):
        if is_upper:
            islands = {(-5, 4, 1), (-1, 3, -2), (-1, 7, -6), (4, 2, -6),
                       (5, -1, -4)}
            islands = {(-5, 1, 4), (-1, -6, 7), (-1, -2, 3), (2, -1, -1),
                       (4, -6, 2)}
        self.board = {(x, y, -x - y)
                      for x in range(-7, 8)
                      for y in range(max(-7 - x, -7) + 1 - abs(np.sign(x)),
                                     min(7 - x, 7) + 1)
                      if (6 * y >= -3 * x + x % 4) == is_upper and
                      not (x, y, -x - y) in islands}
        self.pieces = {'Carrier': {(x, 0, -x) for x in range(3)} |
                                  {(x, -1, -x + 1) for x in range(1, 3)},
                       'Battleship': {(x, 0, -x) for x in range(4)},
                       'Submarine': {(x, 0, -x) for x in range(3)},
                       'Destroyer': {(x, 0, -x) for x in range(2)},
                       'Weapons': {(0, 0, 0), (1, 0, -1), (0, 1, -1)}}
        self.rotations = matrix_powers(((0, -1, 0), (0, 0, -1), (-1, 0, 0)), 6)

Running DLX again on each of the resulting matrices, the upper Blue board allows 17,290,404,311 possible deployments, while the lower Green board allows 21,625,126,041– over 25% more than Blue!  So it seems like it would be a definite advantage to play Green.

Finally, one minor mathematical aside: note that the 60-degree rotation in the above code is specified by the matrix

R = \left(\begin{array}{ccc}0&-1&0\\ 0&0&-1\\ -1&0&0\end{array}\right)

This is convenient, since just like the 90-degree rotations in two dimensions in the original game, the effect is a simple cyclic permutation (and negation) of coordinates, which we could implement more directly without the matrix multiplication if desired.

However, this is cheating somewhat, since R is not actually a proper rotation!  Its determinant is -1; the “real” matrix rotating vectors by 60 degrees about the axis (1,1,1)^T is

R' = \frac{1}{3}\left(\begin{array}{ccc}2&-1&2\\ 2&2&-1\\ -1&2&2\end{array}\right)

It’s an exercise for the reader to see why the simpler form of R still works.


  1. Knuth, D., Dancing Links, Millenial Perspectives in Computer Science, 2000, p. 187-214 (arXiv)
Posted in Uncategorized | 4 Comments

Home-field advantage

I encountered the following problem a few weeks ago: suppose that Alice and Bob want to play a match consisting of a series of games, where the first player to win n games wins the match.  This is often referred to as a “best-of-(2n-1) series,” with many examples in a variety of sports, such as the World Series in baseball (n=4), sets and matches in tennis, volleyball, etc.

There is a problem, though: each game must be played at either Alice’s or Bob’s home field, conferring a slight advantage to the corresponding player.  Let’s assume that the probability that Alice wins a game at home is p, and the probability that Alice wins a game away (i.e., at Bob’s home field) is q<p.

(Note that this asymmetry may arise due to something other than where each game is played.  In tennis, for example, the serving player has a significant advantage; even against an otherwise evenly-matched opponent (i.e., p+q=1), values of p may be as large as 0.9 at the highest levels of play; see Reference (1) below.)

What is the probability that Alice wins the overall match?  Of course, this probability surely depends on how Alice and Bob agree on who has “home-field advantage” for each game.  Let’s assume without loss of generality that Alice plays at home in the first game, and consider a few different possibilities for how the rest of the series plays out:

  1. Alternating: Alice plays at home in odd-numbered games, and away in even-numbered games.  (This is similar to a set in tennis.)
  2. Winner plays at home: The winner of the previous game has home-field advantage in the subsequent game.  (This is similar to a set in volleyball.)
  3. Loser plays at home: The loser of the previous game has home-field advantage in the subsequent game.
  4. Coin toss: After Alice’s first game at home, a fair coin toss determines home-field advantage for each subsequent game.

I’m sure there are other reasonable approaches I’m not thinking of as well.  It is an interesting exercise to compute the probability of Alice winning the match using each of these approaches.  Certainly they yield very different distributions of outcomes of games: for example, the following figure shows the distribution of number of games played in a World Series, between evenly-matched opponents with a “typical” home-field advantage of p=0.55.

Distribution of number of games played in a best-of-7 series with p=0.55, q=0.45.

Distribution of number of games played in a best-of-7 series with p=0.55, q=0.45.

The motivation for this post is the observation that, despite these differences, approaches (1), (2), and (3) all yield exactly the same overall probability of Alice winning the match!  This was certainly not intuitive to me.  And the rule determining home-field advantage does matter in general; for example, the coin toss approach in (4) yields a different overall probability of winning, and so lacks something that (1), (2), and (3) have in common.  Can you see what it is?


  1. Newton, P. and Keller, J., Probability of Winning at Tennis I. Theory and Data, Studies in Applied Mathematics, 114(3) April 2005, p. 241-269 [PDF]
  2. Kingston, J. G., Comparison of Scoring Systems in Two-Sided Competitions, Journal of Combinatorial Theory (Series A), 20(3) May 1976, p. 357-362
  3. Anderson, C. L., Note on the Advantage of First Serve, Journal of Combinatorial Theory (Series A), 23(3) November 1977, p. 363
Posted in Uncategorized | Leave a comment

Analysis of the game of HIPE


Can you think of a word in which the letters HIPE appear consecutively?  What about a word containing HQ?  This “game” is described in a not-so-relevant chapter of Peter Winkler’s wonderful book Mathematical Mind-Benders, where he provides many more examples of what he calls HIPEs: a challenge of a short sequence of letters, a response to which must be a word containing that sequence, consecutively with no intervening letters.  For example, BV has many solutions, one of which is, well, oBVious.

It turns out that I am really bad at playing this game.  My wife, on the other hand, is pretty good at it.  As it happens, I also lag behind her quite a bit when we work together on crossword puzzles, acrostics, anagrams, etc…. that is, in many situations where it is helpful to consider words as made up of letters.  After browsing more examples of HIPEs in Winkler’s book, I wondered what makes a given HIPE easy or difficult.  I will describe my attempt at “automating” the generation of difficult HIPEs… but mostly I want to share what I found to be a fascinating anecdote from Winkler’s discussion of the game.

Word Lists

My idea was pretty simple: difficult HIPEs are most likely those sequences of letters that (1) occur very rarely in the dictionary, but (2) occur sufficiently often in natural language to be recognizable.  To compute these metrics, I used:

  1. A word list consisting of the union of (a) the ENABLE2k word list (updated from the previous ENABLE1), and (b) the third and latest 2014 edition of the Scrabble Official Tournament and Club Word List.
  2. The Google Books Ngrams data set (English Version 20120701) to map each word in my dictionary to a corresponding frequency of occurrence (details of methodology described in this previous post).

As usual, you can download the aggregated frequency data and component word lists here.

2- and 3-letter HIPEs

First, let’s focus on just two letters for now; the following figure shows all possible 2-letter HIPEs, arranged by frequency of occurrence in the Google Books dataset on the x-axis, and frequency of occurrence in the word list on the y-axis, with the example HIPEs in Winkler’s chapter shown in red.  Note that both axes are on a logarithmic scale to better “spread out” the infrequent HIPEs that we are interested in.

All digraphs with corresponding x=frequency of occurrence in the Google Books dataset and y=frequency of occurrence in the ENABLE+Scrabble word list. Winkler's example HIPEs are shown in red.

All digraphs with corresponding x=frequency of occurrence in the Google Books dataset and y=frequency of occurrence in the ENABLE+Scrabble word list. Winkler’s example HIPEs are shown in red.

As expected, the digraphs near the top of the figure aren’t very interesting, while Winkler’s examples are clustered near the bottom of the figure… although I don’t see much horizontal organization.  To get a better view, let’s zoom in on that lower-left corner of the figure, while still containing all of Winkler’s example HIPEs in red:

A zoomed-in view of the 2-letter HIPEs no more common than Winkler's examples.

A zoomed-in view of the 2-letter HIPEs no more common than Winkler’s examples.

Unfortunately, closer inspection of this figure is a little disappointing: there are certainly “interesting” additional HIPEs in there (DK and GP, for example), but no clear separation between them and other unacceptably weird ones like QI, MK, etc.

We can do the same thing for 3-letter HIPEs, but things get messy quickly; there are just too many possible HIPEs with valid solutions, even if we again “zoom in” on just the bounding box of Winkler’s examples:

Similarly zoomed-in view of 3-letter HIPEs.

Similarly zoomed-in view of rare/difficult 3-letter HIPEs.

There are quite a few interesting HIPEs even in that very bottom row in the figure.  Following are some of my favorites, which appear at most twice in the entire word list: BSM, CEV, CTW, CYI, FCO, IKY, KGA, LFC, UCY, UIU, WDF, XEU, XII.


Finally, back to Winkler’s discussion of what makes HIPEs easy or difficult.  This is where things get really interesting.  He points out that, for most people, it is the “kinetic sense” of producing a word with our mouths that dominates our sense of “knowing” the word.  Not its definition, not what it looks like on paper or how it sounds, but the association with the physical act of expressing the word.  If this is really true, then he suggests that perhaps deaf people, “especially those who customarily communicate in a sign language,” might play HIPE better than others:

Resolved to test this hypothesis, I introduced HIPE to a group of hearing-impaired employees at a government agency, who sat together and conversed in ASL daily at lunch.  They found the game completely trivial; as fast as I wrote HIPEs on napkins, they wrote solutions around them.  To them it was a mystery why anyone would think HIPE was any kind of challenge.

This certainly sounds like a significant effect, enough so that I wonder if more rigorous study has been done?


  • Winkler, P., Mathematical Mind-Benders. Wellesley: A K Peters, Ltd., 2007 [PDF]
Posted in Uncategorized | Leave a comment

Secret Santa solution with burritos


You and a group of co-workers all ordered burritos for lunch, and it’s your job to go pick them up.  When you arrive at the restaurant, you are given a cardboard box with 22 foil-wrapped burritos: 1 veggie, 8 chicken, 6 steak, and 7 carnitas.

However, when you return to work and open the box, none of the burritos are labeled!  And because they are, well, burritos, you can’t determine what’s inside each one without opening it up or taking a bite.  So you just pass out burritos randomly and hope for the best.

How well should you expect this to work?  That is, how many co-workers will get what they ordered, in distribution and on average?  What is the probability of the worst-case scenario where no one gets what they ordered?

This is exactly the same “Dinner-Diner Matching Problem” described by Margolius in [1], just dressed up a little differently.  It is also exactly the same problem as the Secret Santa drawing with families described in the previous post, where the “bad” outcome of a person drawing the name of someone in his own immediate family (including himself) corresponds to the “good” outcome here of a co-worker getting one of the burritos that she ordered.  The solution approach is the same in either case, but since the holidays have come and gone, let’s talk about burritos instead.

The objective here is not just to describe the mathematics involved– that’s been done– but to provide Python code to perform the actual calculations.

Permutations with restricted positions

An assignment of exactly one burrito to each of the n=22 co-workers may be represented as a permutation, which we can visualize as a selection of n squares to place non-attacking rooks on an n \times n chess board.  Each row of the board represents a co-worker, and each column represents a specific burrito; a rook represents the corresponding co-worker receiving the corresponding burrito.  “Non-attacking” means nobody gets more than one burrito, and no two people have to share a single burrito.

In addition to the non-attacking requirement, we may impose further restrictions on which permutations are allowed: the figure below shows such a board, with the black squares indicating “restricted” positions corresponding to co-workers receiving their desired variety of burrito.  Note that the co-workers and burritos (rows and columns, respectively) have been conveniently ordered so that the restricted positions are square “sub-boards” along the diagonal.  We are interested in counting permutations that avoid these restricted positions.

Board representing restricted positions for the burrito problem.

Board representing “restricted” positions (shown in black) for the burrito problem.

(It seems backward to refer to someone getting their desired type of burrito as a restricted position.  This would have made more sense in the context of the Secret Santa drawing, where we really did want to avoid drawings within immediate families.  We could just as well “reverse” the board, changing white to black and black to white, and with some modified bookkeeping arrive at the same answer… but it will be useful to keep things as they are, as we will see shortly.)

Rook polynomials

In how many of the n! possible ways to hand out burritos does no one get what they ordered?  Using inclusion-exclusion, this number of permutations avoiding all restricted positions is

\sum\limits_{k=0}^n (-1)^k r_k (n-k)!

where r_k is the number of ways to place k non-attacking rooks on the board B of restricted positions; i.e., on the black squares in the above figure.

So how can we compute the r_k?  In general this is a hard problem, but in this special case we get a lot of help by representing the sequence of r_k as a generating function known as the rook polynomial of the board B:

R_B(x) = \sum\limits_{k=0}^\infty r_k x^k

There are two reasons why this is useful here.  First, if the board B happens to be a disjoint union of sub-boards that have no rows or columns in common, then the rook polynomial of B is the product of the rook polynomials of the sub-boards.  In this case, the sub-boards are the 1×1, 8×8, 6×6, and 7×7 “complete” boards of all-black squares along the diagonal.

Second, the rook polynomial for just such a complete m \times m board is easy to compute:

R_{m \times m}(x) = \sum\limits_{k=0}^m {m \choose k}^2 k! x^k

(Intuitively, we are choosing k of m rows and k of m columns, then placing non-attacking rooks on the resulting k \times k sub-board.)


Putting it all together, given n = m_1 + m_2 + ... + m_j co-workers ordering burritos (or family members in a Secret Santa drawing), made up of sub-groups of m_i people ordering the same type of burrito (or in the same immediate family), the rook polynomial for the corresponding block-diagonal board is

R_B(x) = \prod\limits_{i=1}^j R_{m_i \times m_i}(x)

And the coefficients r_k of the resulting polynomial may be used in the inclusion-exclusion formula above to compute the number of permutations where no one gets what they ordered (or no one draws the name of someone in his immediate family).

The following Python code implements these formulas:

import numpy as np
from numpy.polynomial import polynomial as P
import functools

def binomial(n, k):
    """Binomial coefficient n "choose" k."""
    if 0 <= k <= n:
        return (np.math.factorial(n) //
                np.math.factorial(k) // np.math.factorial(n - k))
        return 0

def rook_poly(m):
    """Rook polynomial for complete mxm board."""
    return np.array([binomial(m, k) ** 2 * np.math.factorial(k)
                     for k in range(m + 1)], dtype=object)

def secret_santa_burrito(groups):
    """Hit polynomial for permutations with block diagonal restrictions."""
    r = functools.reduce(P.polymul, [rook_poly(m) for m in groups])
    n = sum(groups)
    t_1 = np.array([-1, 1], dtype=object)
    return functools.reduce(P.polyadd, [P.polypow(t_1, k) * r[k] *
                                        np.math.factorial(n - k)
                                        for k in range(n + 1)])

print(secret_santa_burrito([1, 8, 6, 7]))

If you look closely, the last function does a bit more work than described so far.  It’s an exercise for the reader to show that we can actually compute the entire distribution of numbers of possible permutations according to the number k of “hits” on restricted positions (i.e., how many people get what they ordered?), as coefficients of the hit polynomial given by

N_B(t) = \sum\limits_{k=0}^n (t-1)^k r_k (n-k)!

where the inclusion-exclusion formula above is simply N_B(0).

Finally, note that derangements are a special case where m_1=m_2=...=m_n=1; that is, the board consists of single black squares along the diagonal.  As is well-known, the probability of a random permutation being a derangement is approximately 1/e.  This generalizes nicely; Penrice in [2] describes a bounding argument that, with n families each of size k, the probability of a successful Secret Santa drawing tends to e^{-k} as n grows large.


  1. Margolius, B., The Dinner-Diner Matching Problem, Mathematics Magazine, 76(2) April 2003, p. 107-118 [JSTOR]
  2. Penrice, S., Derangements, Permanents, and Christmas Presents, American Mathematical Monthly, 98(7) August 1991, p. 617-620 [JSTOR]
Posted in Uncategorized | Leave a comment

Secret Santa puzzle

This problem was inspired by a recent /r/dailyprogrammer exercise on Reddit.  It’s admittedly a little late for a holiday-themed puzzle, but then so was the exercise:

You and your large extended family participate in a Secret Santa gift exchange, where everyone is randomly– and secretly– assigned another person for whom to buy a gift.  To determine the gift assignments, everyone writes their name on a slip of paper and puts the slip into a hat.  Then each person in turn draws a slip from the hat, buying a gift for the person whose name is on the slip.

One obvious requirement is that no one should draw their own name.  However, there is an additional restriction: no one should be assigned to buy a gift for anyone in their immediate family: husbands cannot buy for their wives, parents cannot buy for their children, etc.  Roughly, you cannot buy for anyone that came in the same car with you.

Instead of drawing slips from a hat, the /r/dailyprogrammer exercise involved writing a program to generate the gift assignments for a group of 30 people comprised of immediate family sizes of {1, 1, 2, 1, 2, 4, 3, 1, 1, 2, 1, 2, 1, 1, 3, 2, 2}.  I saw several different approaches in the submitted solutions: some programs were actually deterministic, forgetting that the assignments should be random.  Other programs constructed an assignment iteratively or recursively, essentially modeling the sequential draw-from-a-hat method… but sometimes getting “stuck” near the end in situations discussed here before, where all remaining unassigned persons are in the same family.  (And even when these programs don’t get stuck, not all of the resulting assignments are equally likely.)

Finally, what I thought were the cleanest solutions used rejection sampling to eliminate these problems: keep generating random unrestricted permutations until finding one that satisfies all of the intra-family buying restrictions.

Problem: Your family does the same thing: if at any point in the drawing process, anyone draws their own name or the name of anyone in their immediate family, everyone puts their slips back into the hat, and the process is re-started from the beginning.  On average, how long will this take?  That is, what is the expected number of times the drawing process must be started before it successfully completes?



Posted in Uncategorized | 1 Comment

Proofreading as “mark and recapture”

Last week I saw two different articles (at Futility Closet and DataGenetics) about exactly the same topic: suppose that two reviewers each proofread the same document.  The first reviewer finds A=10 errors, and the second finds B=12 errors, of which C=8 are in common, i.e., 8 of those 12 errors had already been found by the first reviewer.  Can we estimate how many additional errors remain in the document that were missed by both reviewers?

Both articles essentially reproduce the argument given by Pólya (see reference below) that a reasonable estimate for the total number of errors (both found and missed) is given by the following simple formula:

\hat{N} = \frac{A B}{C} = 15

This is a “mark-and-recapture” estimation method similar to that used, for example, to estimate the number of fish in a lake.  Intuitively, the first reviewer identifies and “marks” A/N of the errors in the document (where N is unknown), which should approximately equal the fraction C/B of errors found by the second reviewer that were already marked.

However, neither article points out just how inaccurate this method of estimation can be, nor the fact that better alternatives are available.  For example, continuing with the example above as originally presented in the DataGenetics article, let us assume for the moment that

  1. There really are a total of 15 errors in the document being reviewed.
  2. The first reviewer really does find each error independently with probability 10/15=2/3.
  3. The second reviewer really does find each error independently with probability 12/15=4/5.

Note that this example is arguably somewhat contrived to be “nice,” since the actual number C=8 of errors observed in common by both reviewers happens to equal the expected number of such errors.  This need not be the case; with this model of reviewer accuracy, the number of common errors may be as large as N=15… or as small as zero, in which case our estimator breaks down entirely.

Even if we condition against this unlikely difficulty, essentially asking the reviewers to both start over if they don’t find any errors in common (and to forget the errors they may have already found), there is still significant variance in the possible estimates that may result, as shown in the following figure.

Distribution of estimate of number of errors (N=15, p1=2/3, p2=4/5).

Distribution of estimate of number of errors (N=15, p1=2/3, p2=4/5).

(This is not a sample from a simulation; we can calculate this distribution exactly.)  The mean of the estimate, shown in red, is approximately 15.17, which is pretty good.  However, we can do better– only slightly better in this already-fortunate case, but a lot better in other cases– using a slightly different estimator due to Chapman:

\hat{N} = \frac{(A+1)(B+1)}{C+1} - 1

This estimate has several advantages over the Lincoln-Petersen method.  It has less bias and less variance, particularly in situations like this where the “population” of errors is relatively small.  Also, it still works even when C=0, i.e., when no errors are found in common by both reviewers.

Having said that, it’s not clear how really useful either method is in this particular context, given how widely the resulting estimate may vary from the true value.  These estimation methods work much better when at least one of the two sample sizes A and B is pre-determined (e.g., first catch exactly 100 fish, mark them, then catch 100 fish again), and only C varies randomly.


  • Pólya, G., Probabilities in Proofreading, American Mathematical Monthly, 83(1) January 1976, p. 42 [JSTOR]
Posted in Uncategorized | Leave a comment