Giant Yahtzee

In the game of Yahtzee, players roll five standard dice, then repeatedly re-roll subsets of the dice, trying to obtain various scoring combinations, the most valuable of which is a “Yahtzee,” or five of a kind, i.e., all five dice showing the same value.

If we strip off the complexities of the multiple players, limited number of re-rolls, and various other scoring combinations (e.g., straights, full houses, etc.), there is a nice mathematical puzzle buried underneath:

Roll n dice each with d=6 sides, and repeatedly re-roll any subset of the dice– you can “keep” any or none of your previous rolls, and you can re-roll dice you have previously kept– until all dice show the same value (e.g., all 1s, or all 2s, etc.). Using an optimal strategy, what is the (minimum) expected number of rolls required? In particular, can we solve this problem for “Giant Yahztee,” where we are playing with, say, n=100 dice?

Edit 2020-10-05: Following are my notes on this problem. Given that we (re)roll r of the dice– setting aside the remaining s=n-r already identical dice– let the random variable X_r indicate the resulting new number of identical dice. The distribution of X_r is given by

P(X_r \leq t) = \frac{1}{d^r}[\frac{x^r}{r!}] \left(\sum\limits_{k=0}^t \frac{x^k}{k!}\right)^{d-1} \left(\sum\limits_{k=0}^{t-n+r} \frac{x^k}{k!}\right)

P(x_r = t) = P(X_r \leq t) - P(X_r \leq t-1)

so that the transition matrix P for the absorbing Markov chain with state space {0, 1, 2, \ldots, n} indicating the current number of identical dice has entries

P_{s,t} = P(X_{n-s}=t), 0 \leq s,t \leq n

which we can use to compute the desired expected number of rolls. See the comments for a nice closed form solution for the cumulative distribution function for the number of rolls when n=5.

Posted in Uncategorized | 9 Comments

MATLAB’s colon operator and for loops


The MATLAB colon operator is surprisingly complicated, given that its job seems pretty simple to describe: generate a vector of regularly-spaced values, with a specified starting point, step size, and endpoint. For example, to create the vector x=(0, 0.1, 0.2, \ldots, 1.1, 1.2):

x = 0:0.1:1.2;

At least some complexity is understandable, since as in this example, the “intended” step size and/or the endpoints may not be represented exactly in double floating-point precision. But in MATLAB’s usual habit of trying to “helpfully” account for this, things get messier than they need to be. The motivation for this post is to describe two different behaviors of the colon operator: it behaves in one special way in for loops, and in a different way– well, everywhere else.

Creating vectors with colon syntax

First, the “everywhere else” case: as the documentation suggests,

The vector elements are roughly equal to [start, start+step, start+2*step, ...] … however, if [the step size] is not an integer, then floating point arithmetic plays a role in determining whether colon includes the endpoint in the vector.

That is, continuing the above example, note that ismember(1.2, x), despite the fact that 0+12*0.1 > 1.2. But the actual implementation is even more complex than just computing the “intended” endpoint. The output vector is effectively constructed in two halves, adding multiples of the step size to the starting point in the first half, and subtracting multiples of the step size from the (computed) endpoint in the second half.

So far, this seems reasonably well known, despite the broad strokes documentation. There is a good description of the details of how this works on Stack Overflow. Let’s not worry about those details here, though; instead, what seems less well known is that the same colon expression, such as in the example above, behaves differently when it appears in a for loop.

For loops with (and without) colon syntax

First, it’s worth noting that MATLAB for loops don’t have to use the colon operator at all. With not-quite-full-fledged iterator-ish semantics, you can iterate over the columns of an arbitrary array expression. For some examples:

for index = [2, 3, 5, 7]
    disp(index); % 4 iterations

for index = x
    disp(index); % 13 iterations

(Technically, iteration is over first-column “slices” of the possibly multi-dimensional array. This can cause some non-intuitive behavior. For example, how many iterations would you expect over ones(2,0,3)? What about ones(0,2,3)?)

But here is where things get weird. Consider the following example:

x = 0:0.1:1.2;
for index = 0:0.1:1.2
    disp(find(x == index));

This loop only “finds” 7 of the 13 elements of the original vector above, which was created using exactly the same colon operator expression!

So what’s going on? First, while the colon operator documentation was perhaps merely incomplete, the for loop documentation is downright misleading, suggesting that the behavior is to “increment index by the step on each iteration.” That sounds to me like repeatedly adding the step size to the value at the previous iteration, which would be even worse in terms of error accumulation, and is fortunately not what’s happening here.

Instead, experiments suggest that what is happening is essentially the overly-simplified description in the colon operator (not the for loop) documentation: the statement for index = start:step:stop iterates over values of the form start+k*step — i.e., adding multiples of the step size to the starting point– with the added detail that the number of iterations (i.e., the stopping point) seems to be computed in the same way as the “normal” colon operator. That is, the documentation is also wrong in that it’s not as simple as incrementing “until index is greater than stop” (witness the example above, where the last value is allowed to slightly overshoot the given endpoint). I have been unable to find an example of a colon expression whose size is different depending on whether it’s in a for loop.


What I find most interesting about this is how hard MathWorks has to work– and is still working— to make this confusing. That is, the colon syntax in a for statement is a special case in the parser: there are necessarily extra lines of code to (1) detect the colon syntax in a for loop, and (2) do something different than they could have done by simply always evaluating whatever arbitrary array expression– colon or otherwise– is given to the right of the equals sign.

And this isn’t just old legacy behavior that no one is paying attention to anymore. Prior to R2019b, you could “trick” the parser into skipping the special case behavior in a for loop by wrapping the colon expression in redundant array brackets:

for index = [0:0.1:1.2]
    disp(find(x == index)); % finds all 13 values

However, as of R2019b, this no longer “works;” short of using the explicit function notation colon(0,0.1,1.2), it now takes more sophisticated obfuscation on the order of [0:0.1:1.2, []] or similar nonsense to say, “No, really, use the colon version, not the for loop version.”

Posted in Uncategorized | 1 Comment

Computing the angle between two vectors


Given two vectors in three dimensions, what is the most accurate way to compute the angle between them? I have seen several different approaches to this problem recently in the wild, and although I knew some of them had potential issues, I wasn’t sure just how bad things might get in practice, nor which alternative was best as a replacement.

To make the setup more precise, let’s assume that we are given two non-zero input vectors \mathbf{u}, \mathbf{v} \in \mathbb{R}^3, represented exactly by their double-precision coordinates, and we desire a function that returns a double-precision value that most closely approximates the angle 0 \leq \theta \leq \pi between the vectors, with all intermediate computation also done in double precision.

Kahan’s Mangled Angles

William Kahan discusses three formulas in the “Mangled Angles” section of the paper linked below. The first is the “usual” dot product formula:

\theta = \cos^{-1}\frac{\mathbf{u}\cdot\mathbf{v}}{\left|\mathbf{u}\right|\left|\mathbf{v}\right|}

with the following C++ implementation, which as Kahan points out requires clamping the double-precision dot product to the interval [-1, 1] to avoid a NaN result for some vectors that are nearly parallel:

double angle(const Vector& u, const Vector& v)
    return std::acos(std::min(1.0, std::max(-1.0,
        dot(u, v) / (norm(u) * norm(v)))));

Kahan subsequently describes another formula using the cross product:

\theta = \begin{cases}\sin^{-1}\frac{\left|\mathbf{u}\times\mathbf{v}\right|}{\left|\mathbf{u}\right|\left|\mathbf{v}\right|} & \text{if }\mathbf{u}\cdot\mathbf{v} \geq 0, \\ \pi-\sin^{-1}\frac{\left|\mathbf{u}\times\mathbf{v}\right|}{\left|\mathbf{u}\right|\left|\mathbf{v}\right|} & \text{if }\mathbf{u}\cdot\mathbf{v} < 0 \end{cases}

with the following implementation:

double angle(const Vector& u, const Vector& v)
    double angle = std::asin(std::min(1.0,
        norm(cross(u, v)) / (norm(u) * norm(v))));
    if (dot(u, v) < 0)
        angle = 3.141592653589793 - angle;
    return angle;

Interestingly, Kahan does not mention that this formula also requires clamping the asin argument to the interval [-1, 1]; following is an explicit example of inputs demonstrating the potential problem:

\mathbf{u} = (-0.6171833037218851, -0.4342100935824679, 0.6561603190059907)

\mathbf{v} = (-0.32014601021553196, 0.9003703730169068, 0.29468580477598927)

Finally, despite referring to the above formula as “the best known in three dimensions,” Kahan finishes with the following “better formula less well known than it deserves”:

\theta = 2\tan^{-1}\frac{\left|{\left|\mathbf{v}\right|\mathbf{u} - \left|\mathbf{u}\right|\mathbf{v}}\right|}{\left|{\left|\mathbf{v}\right|\mathbf{u} + \left|\mathbf{u}\right|\mathbf{v}}\right|}

with the following implementation:

double angle(const Vector& u, const Vector& v)
    double nu = norm(u);
    double nv = norm(v);
    return 2 * std::atan2(norm(nv * u - nu * v), norm(nv * u + nu * v));

That’s a lot of square roots. I didn’t focus on performance here, but it would be an interesting follow-on analysis to compare the speed of each of these formulas.

Other approaches are possible; following is the formula that I thought was the most accurate, before reading Kahan’s paper:

\theta = \tan^{-1}\frac{\left|\mathbf{u}\times\mathbf{v}\right|}{\mathbf{u}\cdot\mathbf{v}}

double angle(const Vector& u, const Vector& v)
    return std::atan2(norm(cross(u, v)), dot(u, v));

This has the added benefit of involving just a single square root. This is the formula that I used to compute the “true” angle between vectors to compare errors, using arbitrary-precision rational arithmetic to compute the square root (actually, a reciprocal square root, which is slightly easier) and arctangent. All of the source code is available on GitHub.

And finally, the approach that I saw most recently that motivated this post, using the Law of cosines:

\theta = \cos^{-1}\frac{\left|\mathbf{u}\right|^2 + \left|\mathbf{v}\right|^2 - \left|\mathbf{u}-\mathbf{v}\right|^2}{2\left|\mathbf{u}\right|\left|\mathbf{v}\right|}

double angle(const Vector& u, const Vector& v)
    double u2 = u.x * u.x + u.y * u.y + u.z * u.z;
    double v2 = v.x * v.x + v.y * v.y + v.z * v.z;
    Vector d = u - v;
    return std::acos(std::min(1.0, std::max(-1.0,
        (u2 + v2 - (d.x * d.x + d.y * d.y + d.z * d.z)) /
        (2 * std::sqrt(u2) * std::sqrt(v2)))));


The relative accuracy of each formula depends on the magnitude of the angle between the input vectors. The following figure shows this comparison for angles near 0 (i.e., nearly parallel vectors), near \pi/4, near \pi/2 (i.e., nearly orthogonal vectors), and near \pi (i.e., nearly “anti-parallel” vectors).

Absolute error between computed and exact rounded double-precision angle, vs. true/exact angle.

The x-axis indicates an offset from the “true” angle between the input vectors, computed to roughly 200-bit accuracy. The y-axis indicates the error in the double-precision output, compared against the true angle also rounded to double-precision. The points hugging the bottom of each figure are my poor man’s attempt at indicating zero error (note that these are on a log-log scale), i.e., the 64-bit double-precision output matched the corresponding value rounded from the 200-bit true angle. (In many ways this figure feels like a failure of visual display of quantitative information, and I’m not sure how best to improve it.)

So what’s the takeaway? If you don’t care about absolute errors smaller than a few dozen nanoradians, then it doesn’t really matter which formula you use. And if you do care about errors– and angles– smaller than that, then be sure that your inputs are accurate in the first place. For example, did you normalize your “real” input vectors to unit length first, and if so, how much error did you unintentionally incur as a result? We can construct very small angles between vectors if we restrict to “nice” two-dimensional inputs like (1,0,0) and (\cos \theta, \sin \theta, 0). But it’s an interesting exercise to see how difficult it is to construct vectors “in general position” (e.g., randomly rotated) with a prescribed small angle between them.

As expected, the two arccosine formulas behave poorly for nearly parallel/anti-parallel vectors, and as Kahan describes, the arcsine formula behaves poorly for nearly orthogonal vectors. The two arctangent formulas are the most consistently accurate, and when the one mentioned by Kahan is better, it’s typically much better.


  1. Kahan, W., How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? [PDF]
Posted in Uncategorized | Leave a comment

Ambiguous notation for logarithms

The motivation for this post is to respond to some questions about a recent video presentation titled, “Why you haven’t caught Covid-19 [sic],” presented by Anne Marie Knott, a professor in the Washington University St. Louis Olin Business School. The gist of the presentation is an argument against the “non-pharmaceutical interventions,” or stay-at-home orders, etc., in response to the current pandemic.

I am not interested in arguing about government policies, or even epidemiological models here. Frankly, this video is too easy a target. The error made in this video is a mathematical one– an error so simple, and yet so critical to the presenter’s argument, that it’s not worth bothering with the remainder of the presentation. Instead, I’d like to use this video as an excuse to rant about mathematical notation.

The problem starts at about 3:38 in the video, where the presenter attempts to analyze the COVID-19 outbreak on the aircraft carrier USS Theodore Roosevelt as a realization of the so-called “final size equation,” a model of the end-game, steady state extent of an epidemic in a closed system (since the sailors were isolated onboard the ship for a significant period of time). The final size equation is

p = 1-e^{-R_0 p}

where p is the “final size” of the pandemic, or the fraction of the population that is eventually infected, and R_0 is the basic reproduction number, essentially the average number of additional people infected through contact with a person already infected, in the situation where everyone in the population is initially susceptible to infection.

As the presenter explains, there is a critical difference between a reproduction number less than one, resulting in “extinction” of the disease, and a value greater than one, resulting in an epidemic. Using the fact that 856 of the 4954 sailors onboard the Roosevelt eventually tested positive for COVID-19, corresponding to p=856/4954, we can estimate R_0 by solving for it in the final size equation, yielding

R_0 = -\frac{\ln (1-p)}{p}

It’s a simple exercise to verify that the resulting estimate of R_0 is about 1.1. It’s also a relatively simple exercise to verify that this estimation technique cannot possibly yield an estimate of R_0 that is less than one.

Despite this, the presenter manages– conveniently for her argument that the contagiousness of the virus is overblown– to compute a value of R_0 of about 0.48… by computing the base 10 logarithm \log_{10}(1-p) instead of the natural logarithm \ln(1-p)=\log_e(1-p) in the formula above.

It’s interesting to try to guess how the presenter managed to make this mistake. My guess is that she did this in an Excel spreadsheet; that is the only environment I know of where log(x) computes the base 10 logarithm. In any other programming environment I can think of, log(x) is the natural logarithm, and you have to work at it, so to speak, via log10(x), or log(x)/log(10), to compute the base 10 logarithm.

The mathematical notation situation is a bit of a mess as well. Sometimes I’m a mathematician, where \ln x means the natural logarithm, and any other base is usually specified explicitly as \log_b x. But sometimes I am an engineer, where \log x usually means base 10, but sometimes in a communications context it might mean base 2. Other times I am a computer scientist, where \lg x is a common shorthand for base 2, and \log x can mean pretty much anything, including “I don’t care about the base.”

Posted in Uncategorized | 2 Comments

Sliding rooks (and queens)


Jacob Brazeal describes the following interesting puzzle in a recent MAA article (see reference below): starting with four rooks in the four corner squares of a chessboard, as shown in the figure below, move the rooks into the four center squares… where each single move is constrained to sliding a single rook, either horizontally along its rank or vertically along its file, as far as possible, “blocked” only by another rook or the edge of the board.

Starting configuration (left) and goal configuration (right) of sliding rooks puzzle.

Note that going in the other direction is easy– we can move the rooks from the center out to the corners in just 8 moves. But this problem is harder; it’s a nice programming exercise to determine the minimum number of moves required. The motivation for this post is to describe a slightly different approach to the problem than presented in the article, as well as a variant of the problem using queens instead of rooks that also has some interesting mathematical structure.

All of the code is available on GitHub.

Breadth-first search

We can view this problem as a directed graph, with a vertex v for each possible state of the board, and a directed edge v \to w if we can move a single rook in state v to obtain state w. The goal is to find a minimum-length path from the starting vertex with the rooks at the corners to the goal vertex with the rooks in the center of the board.

It’s an interesting question whether there is a convenient admissible heuristic estimate of the number of moves required from a given board state, that would allow a more efficient informed search. I couldn’t come up with one; fortunately, simple breadth-first search turns out to be acceptably efficient for this problem:

from collections import deque

def bfs(neighbors, root):
    """Breadth-first search.

    Given a graph neighbors:V->V* and a root vertex, returns (p, d),
    where p[v] is the predecessor of v on the path from the root, and
    d[v] is the distance to v from the root.
    queue = deque([root])
    parent = {root: None}
    distance = {root: 0}
    while queue:
        vertex = queue.popleft()
        for neighbor in neighbors(vertex):
            if neighbor not in parent:
                parent[neighbor] = vertex
                distance[neighbor] = distance[vertex] + 1
    return (parent, distance)

It turns out that a minimum of 25 moves are required to solve the puzzle. That’s a lot– too many, really, so that this would probably not be very fun to explore by hand with an actual chess board (more on this shortly). And there are other configurations that are even more difficult to reach. The board that is “farthest” from the initial rooks-in-the-corners state is shown below, requiring 32 moves to reach:

The most difficult sliding rooks configuration, requiring 32 moves to reach.

Symmetry group action

How large is the directed graph that we need to explore? The referenced article describes a graph with {64 \choose 4}=635,376 vertices, one for each possible subset of four squares in which to place the rooks. This graph has some interesting structure, with one really large strongly connected component explored by the above search algorithm, containing 218,412– over one-third– of all possible board states. The remainder is made up of a large number of much smaller unreachable components: the next largest component contains just 278 vertices!

However, these numbers count configurations of rooks that are not usefully distinct. For example, the figure above shows just one of eight “different” vertices, all of which require 32 moves to reach from the initial vertex… but the other seven board states are merely rotations and/or mirror reflections of the board shown in the figure, and thus are reachable by correspondingly rotated and/or reflected versions of the same sequence of 32 moves.

In other words, let’s consider the dihedral group D_4 of symmetries of the board acting on the set of possible board states, and construct the (smaller) directed graph with a vertex for each orbit of that group action.

A standard trick for implementing this approach is to represent each orbit by one of its elements, chosen in some natural and consistent way; and a standard trick for making that choice is to impose some convenient total order on the set, and choose the least element of each orbit as its representative. In the case of this problem, as we encounter each board state v during the search, we “rename” it as min(orbit(v)), the lexicographically least tuple of rotated and/or reflected coordinates of the rook positions:

def orbit(pieces):
    """Orbit of dihedral group action on rooks on a chess board."""
    for k in range(4):
        yield pieces
        yield tuple(sorted((n - 1 - x, y) for (x, y) in pieces))    # reflect
        pieces = tuple(sorted((n - 1 - y, x) for (x, y) in pieces)) # rotate

This search space is almost– but not quite– eight times smaller. From the initial rooks-in-the-corners board state, we can reach 27,467 configurations unique up to rotations and reflections, out of a total of 79,920 possible configurations. We can compute the latter number without actually enumerating all possible board states: the cycle index of the dihedral group acting on the squares of an n \times n board (assuming n is even) is

Z(D_4) = \frac{1}{8}(x_1^{n^2}+2x_4^{\frac{n^2}{4}}+3x_2^{\frac{n^2}{2}}+2x_1^n x_2^{\frac{n^2-n}{2}})

and the number of possible board states with r rooks is

[y^r] \left. Z(D_4)\right|_{x_k=y^k+1}

Sliding queens instead of rooks

Finally, I think perhaps a more “fun” variant of this problem is to consider four queens in the corners, and try to move them to the four center squares as before, using the same “maximal” moves, but allowing diagonal moves as well as horizontal and vertical. This is more tractable to solve by hand, requiring only 12 moves to complete.

And the structure of the corresponding graph is also rather interesting: the large connected component is even larger, so that we can now reach 77,766 of the 79,920 possible configurations of four queens… but the remaining 2,154 configurations are all singleton components! That is, from any one of these 2,154 “lone” configurations, we can move into the large component with just a single move, and from there reach any of those 77,766 configurations… but we can’t get back, nor can we reach any of the other 2,153 lone unreachable configurations!

This was interesting enough that I wondered if it was true in general for other board sizes. It’s trivially true for 2×2 and 4×4 (since there are no unreachable board states), as well as 6×6, 8×8, and even 10×10… but unfortunately the pattern does not continue; the 12×12 board has larger-than-singleton connected components not reachable from the initial queens-in-the-corners state.


  1. Brazeal, J., Slides on a Chessboard, Math Horizons, 27(4) April 2020, p. 24-27 [link]
Posted in Uncategorized | 8 Comments

Seven riffle shuffles is not enough– except when it is


How many times should you riffle shuffle a deck of cards? A commonly cited rule of thumb (see [1], as well as here, here, and here) is that seven riffle shuffles are sufficient to randomize a standard 52-card deck. The motivation for this post is to refine this in a couple of ways: first, even after seven riffle shuffles, enough order still remains in the deck that we can exploit it with a reasonably simple wager (see [2]). This seems to suggest that we need more than seven shuffles– and usually we do– but it is possible, at least in principle, to repeatedly riffle shuffle in such a way that (a) we can tell when it’s okay to stop, (b) sometimes after just seven or even as few as six shuffles, that (c) not just approximately but perfectly randomizes the deck.

Betting on seven shuffles

Alice and Bob are playing a game. They begin with a brand new deck of playing cards, with the cards in the standard “new deck order”:

New deck order. (Card images Copyright 2011,2019 Chris Aguilar, licensed under LGPL 3.0.)

The deck is shuffled, and cards are dealt one at a time from the top of the deck, placing each card dealt back on the bottom of the deck. As the cards are dealt, Alice is looking for the cards from the original top half of the deck: ace through king of hearts, followed by ace through king of clubs, in that order. Meanwhile, Bob is looking for the cards from the original bottom half of the deck, but in reverse order: ace through king of spades, followed by ace through king of diamonds.

When Alice’s first target card, the ace of hearts, is dealt, instead of returning it to the bottom of the deck, we remove it and set it aside in Alice’s pile. Then, once the two of hearts is dealt, we remove it and add it to Alice’s pile, etc. Similarly, when Bob’s first target card, the ace of spades, is dealt, we remove it and start Bob’s pile. The first player to complete his or her pile of 26 cards wins the game, receiving one dollar from the loser.

This should be a fair game, assuming that the deck is truly and thoroughly shuffled: Alice or Bob should each win with probability 1/2. However, starting from a new deck, even after riffle shuffling seven times, Alice wins over 80% of the time!

Probability of Alice winning the New Age Solitaire game, vs. number of initial riffle shuffles.

Reference [2] is accessible to undergraduates, and describes a beautiful formula for computing these probabilities of winning as a function of the number of initial riffle shuffles. But I think this game also makes a great simulation programming exercise, both to simulate the random riffle shuffles themselves, and to efficiently determine whether Alice or Bob wins given a particular shuffled arrangement of cards.

Riffle shuffles and inverse shuffles

The game described above suggests that seven shuffles is not enough to randomize the deck. So, how many more shuffles do we need?

First, let’s review the Gilbert-Shannon-Reeds model of a random riffle shuffle. As discussed recently here, we can represent a riffle shuffle of a deck of n=52 cards as a uniformly random string of n bits. The number of zero bits indicates how many cards to cut from the top of the deck, and the positions of the zero bits indicate how those top cards are interleaved with the cards from the bottom part of the deck (represented by the one bits).

We can associate each such bit string encoding of a riffle shuffle with the corresponding permutation \pi \in S_n, indicating that the riffle shuffle moves the card initially in position i to position \pi(i). Repeated shuffling corresponds to composition, so that the effect of riffle shuffle \pi_1 followed by shuffle \pi_2 is \pi_2 \pi_1.

For reasons we will see shortly, it is also useful to consider the action of the inverse permutation \pi^{-1} associated with a particular bit string encoding of a riffle shuffle. Imagine marking each card in the deck with the corresponding zero or one in the bit string; then “deinterleave,” sliding the “zero” cards out while preserving their relative order, and placing them on top of the remaining pack of “one” cards. The reader can verify that this “inverse shuffle” permutation \pi^{-1} is indeed the inverse of the shuffle permutation \pi corresponding to the same bit string encoding.

Stopping times

It turns out that inverse riffle shuffles are really handy, because of the following result (Lemma 9 in [1]). Suppose that we start with a new deck of cards, and repeatedly inverse shuffle the deck as follows:

  1. Generate a random bit string, and mark each card with its corresponding 0 or 1 label.
  2. Inverse shuffle the deck according to this bit string encoding; i.e., slide the 0 cards out and place them on top of the 1 cards.
  3. Repeat steps 1 and 2… but in step 1, place each new randomly generated 0 or 1 label to the left of the previous labels on the card (from the previous inverse shuffles), effectively prepending a new most significant bit. Thus, during the k-th inverse shuffle, each card will have a k-bit integer label, and the execution of step 2 corresponds to a (stable) sorting of the cards by these integer labels.

If we continue inverse shuffling in this manner, stopping when we observe that the integer labels on the cards are all distinct, then the resulting (inverse) shuffled deck is fully randomized– that is, the resulting arrangement of cards in the deck is perfectly uniformly distributed. (More precisely, if the random variable T is the minimum number of inverse shuffles required for all of the card labels to be distinct, then T is a strong uniform stopping time.)

Shuffling backward in time

We’re not quite done yet. So far we have only described a method of inverse shuffling, and detecting when we can stop inverse shuffling, confident that the resulting arrangement of cards is perfectly randomized. How can we apply this to normal riffle shuffling?

The key observation (derived from Lemma 8 in [1]) is that the sequence of randomly generated “forward in time” riffle shuffles \pi_1, \pi_2, \ldots, \pi_T results in a distribution of deck arrangements with the same distance from uniform as the sequence of corresponding inverse shuffles, in reverse order (i.e., executed “backward in time”).

Thus, as we riffle shuffle the deck, first with permutation \pi_1, then with \pi_2, etc., we can evaluate our stopping condition after T shuffles by checking for distinct card labels resulting from the inverse shuffles \pi^{-1}_T, \pi^{-1}_{T-1}, \ldots, \pi^{-1}_2, \pi^{-1}_1, executed in that (reverse) order.

The following Python code implements this method, returning a list of individual riffle shuffles– as the corresponding bit strings– that when executed in order realizes a uniformly random permutation. Note that that reversed() is critical; without it, the resulting distribution of possible arrangements is observably non-uniform.

import numpy as np

def uniform_riffle_shuffles(n, rng=np.random.default_rng()):
    """Return list of encodings of riffle shuffles of deck of n cards."""
    riffles = []
    while True:

        # Generate another riffle shuffle.
        riffle = rng.integers(0, 2, n)

        # Perform all inverse shuffles in reverse order.
        labels = np.zeros_like(np.arange(n))
        for bit, riffle in enumerate(reversed(riffles)):
            p = np.argsort(riffle, kind='stable')
            labels = (labels + riffle * (2 ** bit))[p]

        # Stop when card labels are all distinct.
        if len(set(labels)) == n:
    return riffles


To wrap up, how long does this perfectly random shuffling process take? This turns out to be an instance of the birthday problem: if the random variable T indicates the number of shuffles required to randomize a deck with n cards, then the cumulative distribution function P(T \leq s) is the probability that n card labels (think people), each of which is an s-bit integer (think possible birthdays), are all distinct:

P(T \leq s) = \frac{(2^s)_n}{2^{sn}}

where (x)_n is the falling factorial. The following figure shows the resulting distribution, with the CDF in blue, the PDF in red, and the mean of approximately 11.7243 shuffles in black.

Distribution of number of riffle shuffles needed to perfectly randomize a 52-card deck.



  1. Aldous, D. and Diaconis, P., Shuffling Cards and Stopping Times, The American Mathematical Monthly, 93(5) 1986, p. 333-348 [PDF]
  2. Zuylen, A. and Schalekamp, F., The Achilles’ Heel of the GSR Shuffle: A Note on New Age Solitaire, Probability in the Engineering and Informational Sciences, 18(3) July 2004, p. 315-328 [DOI]
Posted in Uncategorized | 1 Comment

History of COVID-19 cases

All of the raw data presented here was retrieved from this GitHub repository, which, as described there, is maintained by and thanks to the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), with support by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). I will reiterate their disclaimer that these data are strictly for educational purposes… and this post of mine is yet another possibly inaccurate perspective on data that is already derived from multiple and possibly conflicting sources. If you’re looking for medical guidance, look elsewhere.

The following figure shows the progression of COVID-19 over the last seven weeks or so, as measured by cumulative confirmed cases. I restricted attention to the eight countries currently having at least 2000 confirmed cases.

Cumulative confirmed cases vs. time, for each country currently having at least 2000 confirmed cases.

Although it’s interesting to try to interpret this view of past history, I think it’s difficult to use it to predict even the near future. Note how similar is the exponential growth (note the figure is on a logarithmic scale) for, well, pretty much everyone but China and South Korea, who appear to have taken the most drastic-but-apparently-successful measures to contain the virus. Comparing with Italy, which is now struggling with hospital capacity, we here in the United States appear to be on our way to very similar numbers of cases in a matter of 11 days or so, assuming recent growth continues.

Except that this is potentially misleading, for several reasons. On the pessimistic side, this figure only shows confirmed positive tests— the United States might already have (and in my opinion, almost certainly does have) many more people with the virus, given how little testing has been done so far.

On the other hand, the United States is a larger country than Italy, with roughly five times the population. The following figure attempts to account for this, showing the cumulative number of confirmed cases per million in population (population data obtained here).

Cumulative confirmed cases per million in population over time, for each country currently having at least 2000 confirmed cases.

Importantly, this has no effect on the “slope,” i.e., the exponential rate of growth of cases. It merely delays the same end result– this figure suggests that it might take two and a half weeks, instead of a week and a half… but we’re still headed where Italy is now.

I think an actual prediction of this sort is difficult to make confidently, though. Many interesting dials have been turned, even if only in the past few days. Human behavior has changed, with some significant steps taken on both large scales and small. Whether the eventual effects will be no more disastrous than waiting for the next truck to deliver more toilet paper, I’m not sure. The next two weeks or so will be interesting.

Posted in Uncategorized | 1 Comment

Fair dice: some isohedra are less fair than others


What makes dice fair? Intuitively, when we roll a fair die with n sides, we expect each of its possible outcomes to have the same probability 1/n of occurring. What shapes have this property?

I think this is an interesting problem, in part because it seems like there should be an elegant mathematical solution, but some unpleasantly complicated physics gets in the way. For example, even a standard six-sided die can be vulnerable to manipulation by a skilled cheat. The difficulty is that the fairness of a particular shape of die may depend on assumptions about how the die is rolled– what is the probability distribution from which we randomly draw the die’s initial position, velocity, and angular momentum? What surface does the die land on– can it slide and/or bounce, and if so, with what coefficients of friction and restitution, etc.?

(For this discussion, we will focus on dice that are convex polyhedra, having flat polygonal faces with no holes or indentations. There are other interesting possibilities, though. For example, consider flipping a cylinder as a “three-sided coin,” which may land on heads, tails, or on its curved “edge.”)


One way to get around this dependence on physics modeling assumptions is to appeal to symmetry: if there is any shape of die that could possibly be considered fair, then a sufficient (but perhaps not necessary) condition for fairness is face-transitivity— the group of symmetries of the die should act transitively on its faces. That is, given any pair of faces f_1 and f_2, there must be a rigid transformation of the die into itself that maps f_1 to f_2.

To see why this is the criteria that we want, suppose that Charlie is a skilled-but-myopic cheat with the ability to roll the die biased toward any face f_1 that he desires. However, he can only see the resting position and shape of the rolled die, not the labels on its faces. So, after the roll, but before the outcome is announced, an objective third party, Oscar, selects a face f_2 uniformly at random, and has an opportunity to secretly rotate the rolled die, preserving its original location, to show the randomly selected face f_2, instead of the face f_1 that Charlie intended to roll.

Intuitively, if the die is fair, then Oscar should be able to prevent Charlie from having undue advantage, without Charlie being aware that the die was moved. No matter what face Charlie tries to roll, and no matter what truly random face Oscar selects, it should be possible to rotate one to the other without changing the “space taken up” by the die.

Rotations vs. reflections

A convex polyhedron that is face-transitive is called an isohedron. Wolfram’s MathWorld page (as well as a great Numberphile video with Persi Diaconis) describes 30 types of isohedra, noting that they “make fair dice” … but face-transitivity is a property that requires some qualification. These 30 types of dice all have the property that their full symmetry group acts transitively on their faces, where by “full” we mean that not just rotations but also reflections– think turning the die “inside out”– are allowed.

Since we can’t expect Oscar to secretly turn a physical die inside out, if we more realistically constrain the allowed transformations to include rotations only, then we can ask whether this proper symmetry subgroup also acts transitively on the faces of a die. It turns out that there are six types of isohedra– including two infinite classes of dipyramids– where this is not the case, i.e., the proper symmetry group does not act transitively on the faces. Instead, these “less fair” dice each have two distinct orbits of faces, where it is impossible to rotate a face from one orbit into a face from another orbit while preserving the overall space taken up by the die.

The figure below shows all 30 isohedra, with the six less fair dice shown with their two orbits of faces in red and green.

Isohedra as classified on MathWorld, with six classes whose proper (rotational) symmetry groups do not act transitively on faces. Orbits are shown in red/green.

Implementation notes

Models of these isohedra and Python code to compute their symmetry groups and face orbits are available on GitHub. I started with the models on the MathWorld page, but modified them to provide exact coordinates for all vertices, scaled to have unit minimum edge length. Computing the symmetries mapping one face to another is very similar to the telescope registration problem described here, although there are some interesting additional wrinkles due to working in limited floating-point precision.


  1. Diaconis, P. and Keller, J., Fair Dice, The American Mathematical Monthly, 96(4) April 1989, p. 337-339 [PDF]
  2. Weisstein, Eric W., Isohedron, MathWorld–A Wolfram Web Resource [HTML]
Posted in Uncategorized | 3 Comments

Solving the 2x2x2 Rubik’s cube (revisited)


Years ago I wrote an article about solving the 2x2x2 Rubik’s cube, using an efficient representation of the scrambled states of the cube, and using VPython as a means of visualizing the cube and rotations of its faces. The primary motivation at the time was to describe the use of a linear-time algorithm for converting permutations of the cubies into consecutive integer array indices… but I ignored the arguably more complex details of representing the orientations of the cubies. The objective here is to describe that representation in more detail, with some additional code (available on GitHub) to help visualize what’s going on.

Cubies and cubicles

To begin, we establish notation for how to “hold” the cube and apply moves by rotating its faces. The figure below shows the solved cube, with its eight cubies labeled 0 through 7, and three rotation axes that we will use as shorthand to refer to each possible move.

The solved cube, with three rotation axes, and cubies labeled 0 (hidden) through 7.

Note that cubie zero is not visible in the back; we hold the cube by this cubie zero so that it remains fixed, and apply moves by rotating any of the three faces not involving cubie zero about the axes shown. For example, pressing x in the GUI applies move x to rotate the red face counterclockwise about the x-axis, so that– from the solved state– cubie 1 moves to where cubie 3 was, cubie 3 moves to where cubie 7 was, etc. (Press capital X to rotate the face clockwise; similarly for y, Y, z, and Z.)

Having labeled the cubies, which move around as we rotate faces of the cube, let’s also assign labels 0 through 7 to the corresponding cubicles, or the locations that remain fixed relative to the axes even as we permute the cubies “within” the cubicles. Specifically, for each i from 0 to 7, cubicle i is the location of cubie i in the solved state.

Since cubie zero never moves, we can represent an arbitrary cube state with a permutation \pi \in S_7, where \pi(i) is the label on the cubicle containing cubie i, or equivalently, \pi^{-1}(i) is the label on the cubie in cubicle i. We can represent each of the six moves in the same way (by its action on the solved state), so that applying move \pi' to state \pi yields the new cube state \pi'\pi.

In the Python implementation, we represent a permutation \pi as an array p, with p[i]=\pi^{-1}(i+1)-1 (arrays are indexed starting with zero), so that using Numpy’s array indexing operator, applying move m to state p yields the new cube state p[m].

Tags on cubicles and cubies

Now that we have a means of representing the permuted positions of the cubies, we need to handle their orientations as well. A particular cubie in a particular cubicle may be in any of three different orientations, differing by rotations of 120 degrees about the diagonal through the cubie’s “outer” corner. We need a way to represent the orientations of all cubies in a given cube state, as well as a way to transform this representation corresponding to each possible move.

In preparation for doing this, let’s begin again with the cube in the solved state, and for each cubicle, we select exactly one of its three faces and mark it with a cubicle tag. Having done so, we subsequently mark each cubie as well with a cubie tag on exactly one face… namely, the same face as the corresponding cubicle tag.

Recall that the cubicles– and thus the cubicle tags– remain fixed in space and do not move, but the cubie tags “follow” their corresponding cubies as we apply moves that rotate the cubies from one cubicle to another.

We have some freedom here: this selection of a face per cubicle to tag is arbitrary, and any of the 3^8 possible such selections will work. The figure below shows the convention used in the Python implementation, with the cubicle tags applied to the opposite orange and red faces orthogonal to the (red) x-axis:

Cubicle tags (black) on the left and right faces of the cube, with cubie tags (gray).

In, set draw_axes and draw_labels to True to experiment with this. The cubicle tags are in black, and remain fixed relative to the rotation axes. The cubie tags are in gray, and move with the cubies to which they are attached.

Orientation of cubies

We are now ready to encode the orientations of the cubies. For a given cube state, let’s consider a single cubie: that cubie has a cubie tag, and the cubicle in which it currently resides has a cubicle tag. We encode the orientation of the cubie as an integer 0, 1, or 2, indicating the number of 120-degree clockwise rotations of the cubie needed to align the cubie tag with the cubicle tag.

Example view of scrambled cube illustrating encoding of cubie orientations.

For example, in the figure above showing a particular scrambled cube state, the cubie with cubie tag 2 (in gray, on the orange face) in cubicle 7 has orientation 2; cubie 6 in cubicle 3 has orientation 0, since both of its tags are on the same orange face; and cubie 4 in cubicle 6 has orientation 1 (since the black cubicle tag must be on the face that we can’t see). Also, note that since cubie zero (not shown) never moves, its orientation is always 0.

We now have everything we need to encode an arbitrary cube state as an ordered pair (\pi, \mathbf{q}), where \pi encodes the permuted positions of the cubies as described earlier, and \mathbf{q}=(q_1,q_2,q_3,q_4,q_5,q_6,q_7) is a vector of integers encoding the orientations, with q_i indicating the orientation of the cubie in cubicle i.

Applying moves

But even better, this encoding not only makes it easy to represent a cube state, it also makes it easy to apply cube moves, i.e., face rotations. Given an arbitrary cube state (\pi, \mathbf{q}), and one of the six moves represented by the state (\pi', \mathbf{q}') that results from applying the move to the solved state, we can show that the result of applying move (\pi', \mathbf{q}') to state (\pi, \mathbf{q}) yields the new state (\pi'\pi, \pi'\mathbf{q}+\mathbf{q}'), where the group action \pi'\mathbf{q} permutes the coordinates of \mathbf{q} (with the same Numpy array indexing implementation described earlier)… and the vector addition is simply element-wise mod 3.

To convince ourselves that the modular addition of cubie orientations works, first note that in a face rotation, if a cubie’s position does not change, then neither does its orientation, and so adding zero to the orientation encoding has no effect as desired. For a cubie that does move, it suffices to focus on a single rotation– say a counterclockwise quarter turn about the x-axis– and a single “source” and “destination” cubicle, say from cubicle 3 to cubicle 7. (To see this, note that clockwise and half turns can be expressed as compositions of counterclockwise turns, and for any other rotation axis and source/destination cubicles, we can rotate the entire cube to match the “cubie in cubicle 3 rotated by x to cubicle 7″ geometry.)

Then there are effectively 3×3=9 cases to consider, one for each possible selection of faces where we could have placed cubicle tags on the source (3 choices) and destination (3 choices) cubicles. For each such pair of choices, we can verify– by brute force enumeration if needed– that the counterclockwise arrangement of the (0,1,2) possible orientation codes for a cubie in the source cubicle is preserved as it is rotated into the destination cubicle.

The Fundamental Theorem of Cubology

A couple of final notes: first, the above description of the representation of a cube state as an ordered pair (\pi, \mathbf{q}) \in S_7 \times \mathbb{Z}_3^7 suggests that there are 7! \times 3^7 possible cube states. This isn’t quite true; we have overcounted by a factor of 3, due to the following invariant that is part of what is commonly referred to as the “fundamental theorem of cubology:” for any valid cube state, the sum of the integers in the orientation encoding \mathbf{q} is congruent to zero mod 3. (This can be verified for a particular encoding by first noting that the solved state has all entries of \mathbf{q} equal to zero, and that for each possible move (\pi', \mathbf{q}') the sum of entries in \mathbf{q}' is equal to zero.) Thus, there are only 7! \times 3^6 distinct possible cube states, where in implementation we can, for example, discard the last entry q_7 from our orientation encoding since its value is determined by the other six.

Second, the ideas presented here are also applicable to the original 3x3x3 Rubik’s cube. A cube state represents the 8 corner cubies with a permutation in S_8 and an orientation vector in \mathbb{Z}_3^8, and the 12 edge cubies with a permutation in S_{12} and an orientation vector in \mathbb{Z}_2^{12}, with similar parity constraints on each.

Posted in Uncategorized | Leave a comment

Another dice puzzle


The dice game craps is played by repeatedly rolling two six-sided dice, and making various wagers on the outcomes of the sequence of rolls. The details of the wagers don’t concern us here– instead, let’s consider just one particular example scenario involving a common wager called the “pass line bet”:

Suppose that we have just rolled a 4 (which, by the way, occurs with probability 3/36 on any single roll). Having thus established 4 as “the point,” our objective now is to roll a 4 again, rolling repeatedly if necessary… but if we ever roll a 7 (which, by the way, occurs with probability 6/36 on any single roll), then we lose. If and once we roll a 4, we win.

To summarize: we are trying to roll a 4 (with probability 3/36). If we roll anything else except a 7 (where “rolling anything else except 7” has probability 27/36), we continue rolling.

So, here is a puzzle, let’s call it Problem 1: how long does it take on average to win? More precisely, what is the expected length of a winning sequence of rolls, i.e., where we never roll a 7?


This problem is not new, nor is it even particularly sophisticated. But it is very similar to a problem that circulated a few years ago, and that generated some interesting discussion. Here is that earlier problem, let’s call it Problem Zero:

Roll a single fair six-sided die repeatedly until you roll a 6. What is the expected number of rolls required, given that we observe that all rolls are even?

My motivation for this post is two-fold. First, this is a sort of pedagogical thought experiment. Problem Zero has already been shown in the wild to be dangerously non-intuitive. Problem 1 is the same problem— that is, it is essentially equivalent to Problem Zero, just dressed up with different values for the single-trial probabilities. But is Problem 1 inherently easier, less “tricky?” And if so, why? Is it because the numbers are different, or is it that the problem is cast in a more concrete setting as a casino game, etc.?

I don’t know if these problems are actually equally “tricky” or not. But at least in the case of the known trickiness of Problem Zero, I have a theory. Both of these problems are like many others that I have enjoyed discussing here in the past, in that they are mathematical problems… but of a sort that a student may approach not just with pencil and paper, but also with a computer, even if only as an initial exploratory tool.

Which brings me to my theory, based on observation of past debate of Problem Zero. We can begin tackling this problem with pencil and paper, or by writing a simulation… and I suspect that, in this case, starting with a simulation makes it much harder to come up with a (the?) wrong answer.

Posted in Uncategorized | Leave a comment