Coin flip puzzle

There was a lot of unwarranted buzz this past week about the New England Patriots winning 19 of their past 25 coin tosses prior to each game, leading to semi-serious speculation of more foul play.  This episode is a great source of many interesting probability problems, none of which is the question asked by almost every related news article: “What is the probability that the Patriots would win at least 19 of their last 25 coin tosses purely by chance?”  (The answer is about 1/137, but the better answer is that this is the wrong question.)

I think one slightly better and more interesting question is: how often does this happen by chance?  That is, suppose that we flip a fair coin repeatedly, and after each flip we observe whether at least 19 of the most recent 25 flips came up heads (a success), or less than 19 were heads (a failure).  Over time, for what fraction of flips should we expect success?

Another related question: what is the expected number of games until the first such success?

Posted in Uncategorized | 2 Comments

Arbitrary-precision integer arithmetic in C++

This was mostly just for fun, motivated by questions about a student programming problem.  There are plenty of arbitrary- and extended-precision arithmetic libraries out there, and there are plenty of great references describing the algorithms involved.  But there can be a lot of added complexity in the translation from pseudo-code to actual lines of working code.  My goal here was to provide the latter, while maintaining the readability of the former.  I’m not sure I succeeded.  But it was still fun.

The end result is about 500 lines of code in a single C++ header file, defining an unsigned (i.e., non-negative) integer type with overloading of all sensible arithmetic, bitwise, and relational operators (basically everything except bitwise one’s complement), and implicit conversions to/from a decimal string representation.  The code follows the conventions and variable names of Knuth’s TAOCP Volume 2 very closely throughout, although the division algorithm is slightly simplified.

Here is a link to the usual location containing the source code, which is free, unencumbered, released into the public domain; do what you want with it. (Edit: I had initially posted the code directly in this post, but WordPress source code formatting seems to be a mess, so unfortunately I’m resorting to an external link.)


Posted in Uncategorized | 2 Comments

Kanoodle, IQ Fit, and Dancing Links


During a recent vacation flight, my nephews had a couple of interesting puzzles to occupy them on the plane: Kanoodle and IQ Fit. In both cases, the objective is to arrange a set of colored pieces so that they exactly fill a rectangular grid, as in the following figure showing an example solution to the Kanoodle puzzle:

Example solution to Kanoodle puzzle.

Example solution to Kanoodle puzzle.

The IQ Fit version uses a smaller 5-by-10 grid, and only 10 pieces instead of 12… but the pieces are three-dimensional, so that some of the spherical “vertices” of the pieces may poke down through the bottom of the tray.  (For reference, following is Python code specifying the configurations of pieces for both puzzles.)

kanoodle = {
    'Green': ((1, 1, 0), (1, 2, 0), (2, 2, 0), (2, 3, 0), (2, 4, 0)),
    'Cyan': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (2, 3, 0), (3, 3, 0)),
    'Purple': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (1, 4, 0)),
    'Yellow': ((1, 1, 0), (1, 2, 0), (2, 2, 0), (3, 1, 0), (3, 2, 0)),
    'Orange': ((1, 1, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0)),
    'Blue': ((1, 2, 0), (2, 2, 0), (3, 2, 0), (4, 1, 0), (4, 2, 0)),
    'Pink': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (1, 4, 0), (2, 3, 0)),
    'Gray': ((1, 2, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0), (3, 2, 0)),
    'Magenta': ((1, 1, 0), (1, 2, 0), (2, 2, 0), (2, 3, 0), (3, 3, 0)),
    'Red': ((1, 1, 0), (1, 2, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0)),
    'White': ((1, 1, 0), (2, 1, 0), (2, 2, 0)),
    'LightGreen': ((1, 1, 0), (1, 2, 0), (2, 1, 0), (2, 2, 0))

iq_fit = {
    'Green': ((1, 1, 0), (2, 1, 0), (2, 2, 0), (3, 1, 0),
              (1, 1, -1), (2, 1, -1)),
    'Magenta': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (2, 2, 0),
                (2, 3, 0), (1, 1, -1)),
    'Cyan': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (1, 4, 0),
             (2, 2, 0), (1, 2, -1), (1, 3, -1)),
    'LightGreen': ((1, 1, 0), (1, 2, 0), (2, 2, 0), (3, 1, 0),
                   (3, 2, 0), (3, 2, -1)),
    'Blue': ((1, 4, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0), (2, 4, 0),
             (2, 2, -1), (2, 4, -1)),
    'Purple': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (2, 2, 0),
               (1, 1, -1), (1, 3, -1)),
    'Yellow': ((1, 1, 0), (1, 2, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0),
               (2, 4, 0), (2, 4, -1)),
    'Pink': ((1, 1, 0), (1, 2, 0), (1, 3, 0), (1, 4, 0), (2, 1, 0),
             (2, 2, 0), (1, 2, -1)),
    'Orange': ((1, 3, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0), (2, 4, 0),
               (2, 1, -1), (2, 3, -1)),
    'Red': ((1, 4, 0), (2, 1, 0), (2, 2, 0), (2, 3, 0), (2, 4, 0),
            (2, 1, -1), (2, 4, -1))

A natural question to ask is: which puzzle is harder?  For example, which puzzle has more possible solution configurations?  Can we count them all?  It turns out that we can, using Donald Knuth’s “Dancing Links” (DLX) algorithm (see references below).  But the motivation for this post is to describe another interesting application of DLX: before attempting an exact count, we can first estimate the number of solutions, to determine whether it is even worth trying to enumerate them.

Exact Cover

The Kanoodle and IQ Fit puzzles are examples of the more general exact cover problem: given a matrix of 0s and 1s, does it have a set of rows containing exactly one 1 in each column?  Many other problems may also be reduced to that of finding an exact cover: SudokuLangford sequences, and graph colorings are just a few examples.

Often the most interesting part of solving many of these problems is figuring out the transformation to an exact cover instance.  That is, how do we construct a 0-1 matrix representing our problem, so that a set of rows containing exactly one 1 in each column corresponds to a solution to our problem?  In the case of Kanoodle (and IQ Fit is similar), there are two “kinds” of columns: one column for each of the 12 puzzle pieces, and one column for each position in the 5-by-11 rectangular grid.  Each row of the matrix represents a possible placement of one of the pieces, with a single 1 in the corresponding “piece” column, and additional 1s in the corresponding occupied “position” columns.  A set of rows solving the exact cover problem will have exactly one row for each piece, oriented and positioned so that all of the grid positions are occupied by exactly one piece.

Considering all possible placements of all pieces, the Kanoodle puzzle has a total of 1789 rows and 67 columns, while the IQ Fit puzzle has 3440 rows and 60 columns.  (David Austin wrote a great AMS Feature Column on this same topic, suggesting that the Kanoodle puzzle has 2222 rows; there are a couple of bugs in his implementation, however, yielding some extra invalid piece placements and some missing valid ones.)

Dancing Links (DLX)

Now that we have a matrix representing our exact cover problem, we can feed it to Knuth’s Dancing Links algorithm (referred to as DLX), which is essentially “just” a backtracking depth-first tree search for all possible solutions, but with a “doubly-doubly” linked list sparse matrix representation that allows very efficient undo operations.  I won’t bother with details here, especially since Knuth’s paper is so readable.

My implementation is available in the usual location here.  There isn’t much there; the search algorithm itself follows Knuth’s pseudocode nearly verbatim, right down to the variable names.

There are quite a few DLX implementations in the wild already, but I ended up writing my own for several reasons.  First, the matrix row and column identifiers should be of generic types; the “real” problem is rarely represented in terms of simple consecutive integer matrix indices.  Second, it seemed reasonable to be able to construct the sparse matrix by adding 1s individually, in any order.  Although the diagrams in Knuth’s paper suggest a nice “weaving loom” look to the data structure, there is no need to preserve any particular order relation on the rows or columns.  Third, although I didn’t use it for this analysis, it was relatively simple to add support for “secondary,” or optional columns, which must be covered at most once, instead of exactly once.  (As discussed in Knuth’s paper, the N queens problem is a natural example where this is handy.)

Estimating Tree Size

Finally, and most importantly, I wanted to try using the Monte Carlo sampling technique, mentioned briefly in Knuth’s DLX paper, to estimate the size of the search tree without actually exploring all of it.  As Knuth puts it (see Reference (3) below):

Sometimes a backtrack program will run to completion in less than a second, while other applications seem to go on forever. The author once waited all night for the output from such a program, only to discover that the answers would not be forthcoming for about 1,000,000 centuries.

The sampling algorithm is pretty simple: take a random walk from the root of the tree to a leaf, at each step recording the (out-)degree d_k of the current vertex at depth k, and selecting a child vertex (at depth k+1) uniformly at random.  It is not difficult to show by induction that the expected value of the resulting product of degrees along the sample path equals the total number of leaves in the tree.

Furthermore, the expected value of the product d_0 d_1 d_2 ... d_{k-1} for some fixed k equals the total number of vertices at depth k in the tree… as long as we define d_j=0 for depths j beyond the length of the (possibly shorter) sample path.

This algorithm almost feels like cheating; we are trying to estimate the size of an entire tree, when there are whole chunks of it that we may never explore!

The following figure shows the result of 2 million such random paths through the Kanoodle DLX search tree.  The blue points indicate the current running estimate of the total number of solutions– more precisely, the number of tree vertices at depth 12.

Monte Carlo estimate of number of Kanoodle solutions (blue), with actual number of solutions (red).

Monte Carlo estimate of number of Kanoodle solutions (blue), with actual number of solutions (red).

In this case, the couple of minutes needed to collect the random samples turned out to be longer than the 30 seconds to simply traverse the entire tree, yielding exactly 371,020 different solutions to the Kanoodle puzzle (shown in red above).

The IQ Fit puzzle was more interesting; the following figure again shows the running estimate (in blue) of the number of leaf vertices at depth 10 in the DLX search tree.

Monte Carlo estimate of number of IQ Fit solutions (blue), with actual number of depth 10 terminal vertices (red), and actual number of solutions (green).

Monte Carlo estimate of number of IQ Fit solutions (blue), with actual number of depth 10 terminal vertices (red), and actual number of solutions (green).

Using this estimate of what turned out to be exactly 254,531,140 vertices at depth 10 (shown in red above), I expected to get an exact count via a full tree traversal in about 6 hours.  However, just an hour and a half later, my program had already stopped, reporting that there were only 67,868,848 solutions (shown in green), about one-fourth what I expected.  What was going on?

After further investigation, I realized that, unlike the Kanoodle puzzle, it is possible to arrange the three-dimensional IQ Fit pieces so that all 10 pieces fit into the grid… but with some spaces still left unfilled.  These non-solution arrangements account for the remaining three-fourths of the leaf vertices at depth 10 in the tree.


  1. Austin, D., Puzzling Over Exact Cover Problems, American Mathematical Society Feature Column, December 2009 (HTML)
  2. Knuth, D., Dancing Links, Millenial Perspectives in Computer Science, 2000, p. 187-214 (arXiv)
  3. Knuth, D., Estimating the efficiency of backtrack programs, Mathematics of Computation, 29 (1975), p. 121–136 (PDF)
Posted in Uncategorized | 2 Comments

The clueless student: solution

This is to capture notes on my progress toward a solution to the “matching quiz” problem presented in the previous post, which can be re-stated more concisely as follows: what is the optimal strategy for choosing quiz answers specified as a function f : [n] \rightarrow [n], not necessarily a bijection, that maximizes the probability of agreeing with a randomly chosen “solution” bijection in at least m questions?

A strategy is characterized by a partition \lambda of n, with the parts corresponding to the sizes of the classes of the equivalence kernel of the chosen function f.  The number of parts in the partition indicates the number of distinct quiz answers.  For example,

1 + 1 + ... + 1 = n

corresponds to a bijection, and the single-part partition n=n corresponds to a constant function, i.e., writing the same answer for all n questions.  Note that any function with the same partition “signature” has the same probability of success.

So, given a guessing strategy specified as a partition with k parts:

\lambda = \lambda_1 + \lambda_2 + ... + \lambda_k = n

what is the probability of getting at least m answers correct?  First, we can compute the probability p(S) that exactly some subset S \subseteq [k] of distinct answers is correct using inclusion-exclusion:

p(S) = \frac{1}{n!} \sum\limits_{T \supseteq S} (-1)^{|T|-|S|}(n-|T|)! \prod\limits_{i \in T} \lambda_i

And so the overall probability of at least m correct answers is

P(success) = \sum\limits_{|S| \geq m} p(S)

We can collapse this sum of sums by observing that a particular subset T \subseteq [k] in the inner summation appears {|T| \choose |S|} times in the outer summation for a given size |S|.  Using the identity

\sum\limits_{s=m}^t (-1)^{t-s} {t \choose s} = (-1)^{t-m} \frac{m}{t} {t \choose m}

we can compute the overall probability of at least m correct answers as

P(success) = \frac{1}{n!} \sum\limits_{|T| \geq m} (-1)^{|T|-m} \frac{m}{|T|} {|T| \choose m} (n-|T|)! \prod\limits_{i \in T} \lambda_i

At this point, I’m stuck; I don’t see a better approach than simply evaluating all possible strategies (i.e., generating all integer partitions of n) for small values of 1 \leq m \leq n, and looking for patterns.  Following is the conjectured optimal strategy which holds for n \leq 16:

  • If m=1, then as pointed out in the previous post, write down the same answer n times, guaranteeing exactly one correct answer.
  • For m=2, write one answer \lfloor n/2 \rfloor times, and another (different) answer \lceil n/2 \rceil times.
  • For m=n-1, select an “almost-permutation,” corresponding to the integer partition 1+1+...+1+2=n, so that exactly one answer gets duplicated (and thus exactly one answer is missing).
  • Otherwise, select a random permutation.

Actually, we can describe this strategy more efficiently as follows: (a) select m distinct answers, and distribute them as evenly as possible among the n questions… unless 2<m<n-1, in which case (b) just select a random permutation.  I think (a) is what intuition would suggest is always the optimal approach– it was my wife’s immediate response when I described the problem to her– but I don’t have a good explanation for the qualification (b).


Posted in Uncategorized | Leave a comment

The clueless student

The following problem appeared in Marilyn vos Savant’s “Ask Marilyn” column in the 25 July 2004 issue of Parade magazine:

A high school student who hadn’t opened his American history book in weeks was dismayed to walk into class and be greeted with a pop quiz.  It was in the form of two lists, one naming the 24 presidents in office during the 19th century in alphabetical order and another list noting their terms in office, but scrambled.  The object was to match the presidents with their terms.  The completely clueless student had to guess every time.  On average, how many did he guess correctly? — Bob Hendel

This is a nice problem, with an equally nice answer, which Marilyn boiled down to just two words: “Only one!”  That the average number of correct guesses is exactly one, no matter how many questions (i.e., presidents and terms) in the quiz, is a great demonstration of the power of linearity of expectation.  But I wonder if Marilyn recognized that there are at least two reasonable interpretations of how the student might “guess every time”:

  1. Match each president with a term, using each available term exactly once.  In other words, select a random permutation of the terms.
  2. Match each president with a randomly selected term, where each term may be used more than once, or not at all.  In other words, select a random function mapping presidents to terms.

Fortunately, it doesn’t matter; the average number of correct guesses– one– is the same in either case.

But now let’s make the problem more interesting: suppose that this 24-question quiz is the last exam of the school year, and our clueless student needs just one correct answer to pass the class.  What should he do?  Using either of the two methods above, the probability that he gets every answer wrong, and fails the class, is about 1/e, or about 0.37.  (The actual probabilities are different for the two methods, but both converge to 1/e in the limit as the number of questions grows large.)

There is a better strategy, though: if he instead selects one term of office– any term of office– and writes that same term for every president in the quiz, then he is guaranteed to get exactly one answer correct!

The motivation for this post is to generalize this problem: given a matching quiz with n questions and n corresponding possible answers, and a desired “target” score of 1 \leq m \leq n, what guessing strategy maximizes the probability of at least m correct answers?  How are such strategies characterized, and given such a strategy, how do we compute the corresponding probability of success?


Posted in Uncategorized | 1 Comment

Train tracks and graph theory

I think games and toys, even those for young children, can often be a great source of interesting mathematics.  Recently a friend of mine was helping his sons build a train track using the pieces shown below:

An unfinished train track, with all available track pieces shown.

An unfinished train track, with all available track pieces shown.

As you can see, this particular track is “incomplete”: one track segment has an end that is not interlocked (in jigsaw-puzzle fashion) with any other piece.  Let’s say that a track is complete if it doesn’t have this problem; that is, every piece used has each of its puzzle-piece “ends” interlocked with some other piece in the track.

My friend wondered whether it was possible to build a complete track, using all of the available pieces shown in the photo?  It turns out that it is not possible… and so the next question is, how to prove it?

This problem has a flavor very similar to the so-called mutilated chessboard, in that (1) it seems difficult at first glance to prove that you can’t do something, no matter how hard you try; and yet (2) thinking about the problem in just the right (but not necessarily obvious) way yields a straightforward proof.

In this case, suppose that we view a complete track as a graph, consisting of vertices (drawn as points) and edges (drawn as line segments connecting pairs of vertices).  Each track piece is a vertex, and an edge between vertices indicates that the corresponding track pieces are interlocked, as shown below, with vertices in blue and edges in green:

The corresponding graph, with each vertex (blue) representing a track piece, and each edge (green) representing an interlocking connection between pieces.  The dashed yellow edge would complete the track/graph.

The corresponding graph, with each vertex (blue) representing a track piece, and each edge (green) representing an interlocking connection between pieces. The dashed yellow edge would complete the track/graph.

If a track is complete, then every possible connection has an edge associated with it.  Let us temporarily suppose that the track shown above is actually complete, that the “three-way switch” piece is actually a four-way switch, and that the extra dashed yellow edge represents the final required connection.

Now, given such a graph representing a complete track, how many edges does it have?  We could just count up the green and yellow line segments, and find that there are 49 edges… but instead, let us examine each vertex v in turn, computing its degree d(v), or the number of edges emanating from it.  We will see why this is handy shortly.

Most of the vertices, corresponding to the simple track segments with two ends, have degree 2.  But some of the “switch” track pieces have degree 3, and one of our temporarily-augmented pieces now has degree 4.  If we add up the degrees of all of the vertices, we find that the total in this case is 98… exactly twice the number of edges!

This is no coincidence.  In general, given any undirected graph, the sum of its vertex degrees equals twice the number of edges.  To see why, consider that in the summation, each edge in the graph gets “counted” exactly twice, once for each of its two vertex endpoints.

An immediate corollary is that, in any such graph, the sum of the vertex degrees must be even.  And this leads to the desired result: if we expect to have any chance of using all available track pieces to build a complete track, then the sum of the degrees– i.e., total number of interlock points– of all of the pieces must be even.  But as shown in the original photo above, in addition to the simple track segments with degree 2, there are five pieces with degree 3, yielding an odd total.


Posted in Uncategorized | Leave a comment

Floating-point round trips revisited

After a couple of interesting conversations on the subject, I think I should try to clear up some potentially confusing comments I made in my last post about converting binary floating-point values to decimal strings… and back to binary floating point again.

Injection != round-trip identity

“… You need the representation to be one-to-one, so that distinct finite values always produce distinct output.  More precisely, we would like to be able to (1) convert a floating-point value x to a string, then (2) convert the string back to floating-point, recovering exactly the original value x.”

In the above quote, “more precisely” is a misleading choice of words, since it suggests that these two properties (one-to-one-ness and round-tripping) are equivalent, when they are not.  For a simplest possible example, consider the correctly rounded floating-point conversion from single-bit binary to single-digit decimal.  This conversion is one-to-one: distinct powers of two always map to distinct single-digit multiples of powers of 10.  However, the round-trip conversion, from binary to decimal and back to correctly rounded decimal, is not the identity.  It’s an exercise for the reader to verify that the value 2^{27}, used by Matula in Reference (2) below, is a minimal counterexample.

To see why this distinction matters, consider the more general problem of the correctly rounded conversion from n-digit base \beta to m-digit base \nu… where \beta and \nu are not “variants of a common base” (more on this later).  Then as shown in Reference (1), the resulting function is one-to-one if and only if

\nu^{m-1} \geq \beta^n - 1

However, we want more than the ability to simply invert this one-to-one conversion function: instead, we would like to be able to compose this conversion function with another correctly rounded conversion back in the other direction, and yield the identity.  As shown in Reference (2), this composition yields the identity if and only if the following stronger condition holds:

\nu^{m-1} > \beta^n

Fortunately, in the case where \beta=2 and \nu=10, the single-bit-to-single-digit wrinkle mentioned above (i.e., n=m=1) is the only situation where these two conditions are not equivalent, and so in practice when using floats and doubles this distinction is less important.

Variants of a common base

It’s another exercise for the reader to verify that the second inequality above may be solved for a minimal number m of decimal digits required for a successful round-trip conversion from n-bit binary, to yield the formula from the last post:

m_{min} = \lceil{n \log_{10}(2)}\rceil + 1

However, there is an important detail involved in turning that strict inequality into this ceiling expression, that is left out of many discussions that I see online (at least where an attempt is made to generalize beyond binary-decimal conversions): this only works since n \log_{10}(2) is not an integer.  That is, base 2 and base 10 are not “variants [i.e., powers] of a common base,” as discussed in the references below.  For conversions between bases that are powers of some common base, these formulas do not hold in general.  (Consider various conversions between binary and hexadecimal, for example, again as discussed in the references below.)


  1. Matula, David W., The Base Conversion Theorem, Proceedings of the American Mathematical Society, 19(3) June 1968, p. 716-723 [PDF]
  2. Matula, David W., In-and-Out Conversions, Communications of the ACM, 11(1) January 1968, p. 47-50 [ACM]


Posted in Uncategorized | Leave a comment