A wrong solution to the Riddler sock matching problem

From last week’s FiveThirtyEight.com Riddler column:

From Anna Kómár comes a stumper about socks: In my laundry basket, I have 14 pairs of socks that I need to pair up. To do this, I use a chair that can fit nine socks, at most. I randomly draw one clean sock at a time from the basket. If its matching counterpart is not already on the chair, then I place it in one of the nine spots. But if its counterpart is already on the chair, then I remove it from the chair (making that spot once again unoccupied) and place the folded pair in my drawer. What is the probability I can fold all 14 pairs without ever running out of room on my chair?

This week’s column describes several solution approaches, including Monte Carlo simulation and recurrence relations. But here is another candidate approach that is potentially more efficient to compute:

Associate each possible sequence of sock drawings with a north-east lattice path, from (0,0) when the basket is full, to (14,14) when the basket is empty, with a north step corresponding to drawing an unmatched sock and placing it on the chair, and an east step corresponding to drawing a sock whose counterpart is already on the chair, removing, folding, and placing the pair in the drawer.

Valid drawing sequences are represented by such a lattice path that never touches the diagonal y=x-1. (That is, we can’t have a negative number of socks on the chair.) There are {28 \choose 14}-{28 \choose 13}=2674440 of these paths (more on this later); of these, the number of successful drawing sequences, never running out of room on the chair, by also never touching the diagonal y=x+10, is

{28 \choose 14}-{28 \choose 13}-{28 \choose 4}+2{28 \choose 3}-{28 \choose 2}=2660139

Thus, the desired probability is 2660139/2674440, or about 0.994653.

This is effectively the argument made in Gilliand [1] and Pantic [2]. However, as pointed out by Paulsen [3], this incorrectly assumes that each distinct lattice path is equally likely, when in fact we are more likely to encounter paths requiring more room on the chair.

For this argument to work, we need a different problem: instead of 14 distinct (colors of) pairs of socks, suppose that we are folding 14 pairs of white athletic socks, essentially identical except for an “L” stitched on each left sock and “R” stitched on each right sock. Now, any left sock may be matched with any right sock and vice versa.

In this modified problem, valid drawing sequences are now all possible north-east lattice paths, and successful drawing sequences are paths strictly between the diagonals y=x+10 and y=x-10. (That is, multiple socks on the chair must be all left or all right.) This does work, since each equally likely drawing sequence corresponds to a distinct lattice path. (The Riddler column does note that modeling the original problem using lattice paths is “tricky, since the probabilities of transitioning from one state to the next depended on how many paired and unpaired socks remained in the basket.” This is true… but that’s not really why the lattice path calculation doesn’t work in that case– even in this modified problem, state transition probabilities also depend on the composition of socks remaining in the basket.)

This version of the problem was discussed previously here; the motivation for this post is really just to collect my notes on the general solution to these counting problems (incorrectly applied in the former, correctly in the latter).

Theoerem: Given positive integers such that a-d<b<a+u, the number of north-east lattice paths from (0,0) to (a,b) that do not intersect the diagonals y=x+u nor y=x-d is

\sum\limits_{k=\lceil-\frac{a}{u+d}\rceil}^{\lfloor\frac{b}{u+d}\rfloor}{a+b \choose a+k(u+d)} - \sum\limits_{k=\lceil-\frac{b-d}{u+d}\rceil}^{\lfloor\frac{a+d}{u+d}\rfloor}{a+b \choose a+u+k(u+d)}

Proof: First, reviewing the standard ballot-counting reflection argument: given a path to (a,b) that does first intersect diagonal y=x+u at (k,k+u), reflect the remainder of this path about that diagonal, so that the endpoint (a,b) is reflected to (b-u,a+u). This establishes a bijection between the complementary set of diagonal-intersecting paths to (a,b) and the set of all paths to (b-u,a+u).

We use this same technique here, but with repeated reflections: starting with all (unconstrained) paths to (a,b), we subtract those invalid paths that first intersect diagonal y=x+u… but we must add back paths that subsequently also intersect diagonal y=x-d, via a second reflection, etc. Similarly, we must also subtract invalid paths that first intersect diagonal y=x-d, then add back paths that subsequently intersect diagonal y=x+u, etc.

Grouping inclusion-exclusion terms by the number j of initial diagonal intersections yields

{a+b \choose a} + \sum\limits_{j \geq 1}(-1)^j \left({a+b \choose a+\lceil\frac{j}{2}\rceil u + \lfloor\frac{j}{2}\rfloor d} + {a+b \choose a-\lfloor\frac{j}{2}\rfloor u - \lceil\frac{j}{2}\rceil d}\right)

the terms of which may be rearranged to yield the difference of sums above.


  1. Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem, Involve7(5) 2014, p. 691-697. [PDF]
  2. Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
  3. Paulsen, W., The sock problem revisited, The College Mathematics Journal, 52(3) 2021, p. 193-203. [CMJ]
Posted in Uncategorized | Leave a comment

Probabilities in one-card Nomination Whist

My niece and nephew introduced me to a card game that they called Tricks, also known as Nomination Whist or Oh Hell. It’s a fun and interesting game, with simple rules but also with the potential for complex strategy. This post focuses on just one round in the game, played as follows:

Each of n players is dealt h=1 card from a standard 52-card deck. One additional trump card is dealt face-up indicating the trump suit. Each player in turn starting left of the dealer plays the card from their hand, and the “trick” is won by either the highest-ranked card (ace is high) in the trump suit, or if no trump is played, by the highest-ranked card in the led suit. (In other rounds where h>1, the player that wins the trick leads the next hand, continuing for a total of h tricks in the round; players must follow the led suit if able.)

Between the deal and the play of the hand, each player in turn bids one or zero points (more generally up to h points), indicating whether they think they will win the trick or not. The game is particularly interesting in that the objective is to win exactly the number of tricks bid, no more and no less.

Since you have no choices in how to play a hand with just one card, only bidding presents an opportunity for strategy. (Things are more complicated in rounds with more cards in the hand, hence the focus on the one-card case here.) How you should bid depends on the probability of winning the trick, which is in turn determined by one of four possible cases:

  • Trump high: your card is in the trump suit, and ranks higher than the trump card.
  • Trump low: your card is in the trump suit, and ranks lower than the trump card.
  • Off suit lead: your card is off suit and leads, i.e., you are the player left of the dealer.
  • Off suit follow: your card is off suit and follows, i.e., you are not the player left of the dealer.

The following table and figure show the formulas and probabilities, respectively, in each of these cases, as a function of your card’s rank r (2=deuce through 14=ace) and the number n of players.

Your cardProbability of winning
Trump high{50-(14-r) \choose n-1}/{50 \choose n-1}
Trump low{50-(13-r) \choose n-1}/{50 \choose n-1}
Off suit lead{50-(12+14-r) \choose n-1}/{50 \choose n-1}
Off suit follow\frac{r-2}{50}{49-(12+14-r) \choose n-2}/{49 \choose n-2}

If everyone bids simultaneously, then this is essentially the whole story: bid one point if and only if the probability of winning indicated above is greater than 1/2. For example, in a four-player game, you should never bid for the trick with an off suit card, and always bid with trump… except with a deuce, or with a three where the deuce is the trump card.

But this oversimplifies the problem, since everyone does not bid simultaneously. Earlier players’ bids may inform and affect later players’ bids. The strategy complexity blows up quickly– analyzing even the simplest two-player game presents an interesting challenge: is optimal strategy a random mix of bluffing and/or slow play?

Posted in Uncategorized | 3 Comments

Counting marked dice

The following is from FiveThirtyEight.com’s Riddler:

I have an unlabeled, six-sided die. I make a single mark on the middle of each face that is parallel to one pair of that face’s four edges [see example in the figure below]. How many unique ways are there for me to mark the die? (Note: If two ways can be rotated so that they appear the same, then they are considered to be the same marking.)

Example marking of a die.

I think this is a particularly nice potential exercise for students for the usual reasons: on the one hand, wearing our mathematician hat, we can apply the lemma that is not Burnside’s… but this problem makes us think a bit harder about the coloring (i.e., how does each particular “color” represent an oriented mark on the die?), and about the distinction between the group of symmetries acting on the faces of the die versus the colorings of the faces of the die.

Alternatively, wearing our programmer hat, we can solve the problem (or maybe just verify our cocktail napkin approach above) by brute force enumeration of all possible markings, and count orbits under rotation with the standard trick of imposing some easy-to-implement ordering on the markings (converting each to its minimal representation under all possible rotations)… but again, what data structure should we use to represent a marking, that makes it easy to both enumerate all possibilities, and to compute orbits under rotation?

Following is my solution: we count the number of possible markings (“colorings”) that are fixed by each of the 24 rotational symmetries of the die, and average the results. To clarify the description of rotations, let’s temporarily distinguish one “red” face of the die, and group rotations according to the final location of the red face:

  • If the red face stays in its original location, then:
    • The identity leaves all 2^6 markings fixed.
    • Either of two quarter turns about the (axis normal to the) red face leaves zero markings fixed (since it changes the marking on the red face itself).
    • A half turn about the red face leaves 2^4 markings fixed.
  • If we rotate the red face a quarter turn to one of the four adjacent locations, then:
    • No additional rotation leaves zero markings fixed.
    • Either of two additional quarter turns about the (new axis normal to the) red face leaves 2^2 markings fixed.
    • An additional half turn about the new red face leaves 2^3 markings fixed.
  • Finally, if we rotate the red face a half turn to the opposite face, then:
    • No additional rotation leaves 2^4 markings fixed.
    • Either of two additional quarter turns about the new red face leaves 2^3 markings fixed.
    • An additional half turn about the new red face leaves 2^4 markings fixed.

Putting it all together, the number of distinguishable ways to mark the die is

\frac{1}{24}(2^6 + 2^4 + 4(2\cdot 2^2 + 2^3) + 2^4 + 2\cdot 2^3 + 2^4) = 8

All eight distinct ways to mark the die. Marks on opposing faces are the same color, with blue indicating parallel opposing marks, and red indicating skew (“perpendicular”) opposing marks.

The figure above shows the eight distinct ways to mark the die. Rather than “unfolding” the dice to explicitly show all six faces of each, the marks are colored red or blue, so that marks on opposing faces– one visible, one hidden– are the same color, with blue indicating parallel opposing marks, and red indicating skew, or “perpendicular,” opposing marks.

Posted in Uncategorized | Leave a comment

Analysis of horse racing board game


I recently encountered a “horse racing” game played with two dice and a game board similar to the one shown in the figure below. Horses numbered 2 through 12 begin in the starting gate at the left, indicated by the green dots. At each turn, both dice are rolled yielding a total from 2 to 12, and the corresponding horse advances to the right one step, to the next black dot in their lane. The winner is the first horse to reach the finish line, indicated by the yellow dots.

Horse racing board game with the typical board layout.

What I found interesting about this game is the apparent intent to make the race fair. That is, for example, horse 7 is the most likely to advance on any particular dice roll, but this advantage is offset by requiring more steps to win. Indeed, if we describe the design of the board as a vector of integers indicating the required number of steps for each horse, in this case (3,6,8,11,14,17,14,11,8,6,3), then it makes at least some intuitive sense that these values are roughly in proportion to the corresponding dice probabilities.

However, this approach to leveling the playing field ends up being an extreme over-correction: horse 7 has way too far to go, winning the race less than 3% of the time, while Snake Eyes and Box Cars will each win over 19% of the time.

So, how can we fix this? Is there a better board design that more closely approximates a uniform distribution of wins across the eleven horses?

Scratching horses

First, I cheated a bit in the above description of the game. All eleven horses don’t actually race together; instead, before the race begins, the rules specify a procedure for scratching— i.e., removing from the race– four of the horses, and only the remaining seven horses race as described above. The scratching procedure is pretty simple: roll both dice, and scratch the corresponding horse. Continue rolling, ignoring rolls for horses that have already been scratched, until four distinct horses have been scratched. Similarly, during the race of the remaining seven horses, simply ignore any rolls for scratched horses. (I’m also leaving out the gambling nature of the game, which involves contributing chips or money to a winner-take-all pot with almost every dice roll; this scratching business really just serves to further increase the pot by stretching out the game a bit.)

To account for this, we need to compute the probability of scratching each of the C(11,4)=330 possible subsets of four horses, which we can then use to weight the probabilities of each of the unscratched horses subsequently winning the race.

Given a particular subset S of four horses, we compute the probability of exactly that subset being scratched by solving for x_\emptyset in the linear system of equations in 2^{|S|}=16 variables, indexed by subsets of S, where

x_S = 1

x_T = \sum\limits_{h \in S} p_h x_{T \cup \{h\}}

where p_h is the single-roll probability of advancing horse h (e.g., p_2=p_{12}=1/36, etc.). In Python with SymPy (all code is available on GitHub):

def p_scratch(q):
    """Probability of scratching subset with dice probabilities q[:]."""
    a = sympy.eye(2 ** len(q))
    for i in range(2 ** len(q) - 1):
        for h, p_roll in enumerate(q):
            j = i | (2 ** h)
            a[i, j] -= p_roll
    return sympy.linsolve((a, sympy.eye(2 ** len(q))[:, -1])).args[0][0]

Win probabilities

At this point, given a particular subset of seven unscratched horses in a race, we are prepared to compute the probabilities of each of those horses winning, with a relatively simple modification of the generating function approach described in this earlier post, as the sum of coefficients \sum [x^m/m!]G_j(x) for horse j, where

G_j(x) = p_j \frac{(p_j x)^{n_j-1}}{(n_j-1)!} \prod\limits_{i \neq j} \sum\limits_{k=0}^{n_i-1} \frac{(p_i x)^k}{k!}

and n_j is the number of steps required for horse j to win. Again in Python with NumPy (and the fractions module in case we want exact results):

def p_win1(p, n):
    """Probability of first horse winning."""
    g1 = P.polypow([Fraction(0), p[0]], n[0] - 1) / np.math.factorial(n[0] - 1)
    g2 = reduce(P.polymul,
                ([pi ** k / np.math.factorial(k) for k in range(ni)]
                 for pi, ni in zip(p[1:], n[1:])))
    g = reduce(P.polymul, [[p[0]], g1, g2])
    return sum(c * np.math.factorial(m) for m, c in enumerate(g))

Putting it all together, computing the overall probabilities of each horse winning the race after scratching four of the horses, it turns out that the scratching just makes the non-uniformity even worse, although only slightly: horse 7 now wins less than 2.5% of the time, and horses 2 and 12 win over 20% of the time.

Evaluating all board designs

So, finally, we can turn the above crank on each of the C(17+6-1,6)=74,614 different possible “reasonable” board designs, where by “reasonable” I mean symmetric (horses like 2 and 12 with the same dice probability should require the same number of steps to win) and monotonic (a horse like 7 with higher dice probability should never require fewer steps to win), ranging from the simplest first-roll-wins board (1,1,1,1,1,1,1,1,1,1,1) to the longest-race-but-still-unfair board (17,17,17,17,17,17,17,17,17,17,17). The figures below show some of the best results.

Probabilities of each horse 2-12 winning a race on various board designs (see colors below): at left, scratching no horses so that all 11 horses race; at right, scratching 4 horses and racing the remaining 7.

As already discussed, the original game board (3,6,8,11,14,17,14,11,8,6,3) fares pretty poorly. Interestingly, if we ignore scratching and just race all eleven horses together (figure at left), then the fairest board is pretty unambiguously (2,3,4,5,6,7,6,5,4,3,2)— almost but not quite the same as this Canadian version of the game– which minimizes all three of the \ell_1 norm (proportional to total variation distance), the \ell_2 norm, and the relative entropy distances from the uniform distribution.

If we play the standard game with four scratched horses (figure at right), then there are a few options depending on the yardstick for fairness. The board (4,7,9,11,13,15,13,11,9,7,4) minimizes the \ell_1 norm, and comes in third for the other two measures. The board (4,6,8,10,12,13,12,10,8,6,4) minimizes the \ell_2 norm and relative entropy, and comes in second for the \ell_1 norm. Alternatively, the board (4,6,8,10,12,14,12,10,8,6,4) comes in second, for both the no-scratch and four-scratch variants of the game, and for all three measures… except for the four-scratch \ell_1 norm, where it comes in fifth out of all candidate boards.

Posted in Uncategorized | Leave a comment

Rube Goldberg Monte Carlo


“This shouldn’t take long,” Alice says. “It’s only flipping a coin, after all.”

Alice and Bob are talking in the hallway at work. Alice has been tasked with analyzing the behavior of a simple black box. When a button on the side of the box is pressed, a mechanism inside the box flips a biased coin, and displays either a green light when the coin comes up heads– with some fixed but unknown probability p— or a red light when it comes up tails.

“As we discussed in the meeting, I will press the button to flip the coin 200 times, observe the number of green and red light indications, and report the resulting 95% confidence interval bounds for p— to two digits, as usual*.”

“That’s great for you,” Bob grumbles. “I’m the one stuck with analyzing the Rube Goldberg Variation.”

Bob’s task is more involved. The Rube Goldberg Variation is a much larger black box, taking up almost an entire room. Instead of just a button on the outside of the box, there is a dial as well, that may be set to a positive integer n specifying the “mode” in which the box is configured.

The mechanism inside is much more complex as well: pressing the button on the side of the box starts a convoluted sequence of interacting mechanisms inside, many of which exhibit some randomness similar to Alice’s coin flip. This process takes several minutes… but eventually, the box displays either a green light indicating “success” — with some fixed but unknown probability q(n) that depends on the selected mode– or a red light indicating “failure.”

“I suppose the good news is that at least we’re only focusing on the one mode n=10,” Bob says. “But this analysis is still going to take days to complete. Not only does each single experiment– from pressing the button to observing the light– take longer, but I will also need more sample outcomes of the experiment to estimate q(10) with the same degree of certainty.”

Alice looks confused. “Wait, why do you need more samples? Shouldn’t the usual 200 be enough?”

Bob replies, “If it were just a coin flip, sure. But the Rube Goldberg Variation has a lot more moving parts, with a lot more opportunities to introduce randomness into the process. I need more samples to cover all of the possible combinations of corner-case interactions between all of those components.”

Monte Carlo simulation

This post is motivated by Bob. Recently, Bob was an undergraduate student… but I’ve also encountered discussions like the one above when Bob has been a computer scientist with a Ph.D. So, this is my attempt to capture my thoughts on this subject, to hopefully highlight the specific areas where there are potential pitfalls and opportunities for confusion.

In these past discussions, there were not any actual black boxes. Instead, there were computer simulations of the processes inside hypothetical black boxes. For example, the simulation of Alice’s black box might look like this (in Python):

def alice():
    return 1 if random.random() < 0.75 else 0

where we can estimate the probability p of success by evaluating alice() many times and averaging the results. Of course, that would be a bit ridiculous if we were allowed to inspect the source code, since we could just read directly that p=0.75.

But now consider Bob’s more complicated simulation:

def bob(n):
    position = 1
    while True:
        while True:
            move = random.randint(-1, 1)
            if 1 <= position + move <= n:
        position = position + move
        if move == 0:
    return 1 if position == 1 else 0

(In reality, Bob’s simulation should be much more complicated than this, with so many lines of code, and so many random components, that analysis by inspection is practically impossible. But I can’t resist inserting a tractable puzzle even when it’s not the point: inside this black box, there are n=10 slots arranged in a row, with a marble initially in the left-most slot. When Bob presses the button, the marble makes a series of random moves, with each move selected uniformly from {left, stop, right}. An attempt to move left beyond the left-most slot, or right beyond the right-most slot, is ignored. The marble moves randomly back and forth along the row of slots, until a “stop” move, resulting in display of a “successful” green light if the marble stops in its original left-most slot, or a “failure” red light otherwise.)

It’s at least not as immediately clear from inspection alone what is the probability q(n) that bob(n) returns 1… indeed, with those nested while True loops in there, it might not be obvious whether this function is guaranteed to return at all (fortunately, it is– more on this later).

So, to level the playing field, so to speak, let’s suppose that neither Alice nor Bob is able to inspect their simulation source code at all. In other words, neither is allowed to open their black box and peek inside; all either can do is inspect their black box’s output, observed from repeatedly pressing the button on the side of the box. (Remember, Bob was the one complaining; so we’re trying to remove any unfair advantage that Alice might have.)

Given this setup, we are considering the following question: because Bob’s simulation is more complex, does he require more sample executions of his simulation to estimate q(10), than Alice requires to estimate p with the same degree of certainty?

A simulation is a function

I think it’s helpful to think of a simulation like those above as a function, in the mathematical sense; that is, a simulation is a function

f:X_1 \times X_2 \times \cdots \times X_m \times U^{\omega} \to \{0,1\}, U=[0,1)

that maps its input to a binary output y \in \{0,1\}, that is,

y = f(x_1,x_2,\ldots,x_m,(u_1,u_2,\ldots))

where the output y=1 means “success” and y=0 means “failure.”

The input parameters x_1,x_2,\ldots,x_m specify the configurable “mode” of the simulation. In Alice’s case, m=0 since her simulation isn’t configurable. In Bob’s case, m=1 and X_1=\mathbb{Z}^+, where he is interested in the specific case x_1=10.

The inputs u_1,u_2,\ldots specify a single realized sample of a pseudo-random number stream, where each u_k is drawn independently and uniformly from the unit half-open interval U. This is enough expressive power to represent effects of the random processes in any computable simulation: in the context of the Python implementations above, each u_k is the result of the k-th evaluation of random.random(), which we can think of as the “subroutine” in terms of which all other pseudo-random functions, like random.randint(a, b), or random.gauss(mu, sigma), etc., may be implemented.

(Pseudo) randomness

I think viewing the “random” behavior of a computer simulation in this mathematical way is useful, since it forces us to consider some important implementation details. Note that by specifying our simulation as a function, we require that the function be well-defined. That is, every possible input must map to a unique output.

Input must map to a unique output: in other words, our simulation must be repeatable. Given the same input values– specifying both the configuration parameters x_1,x_2,\ldots,x_m as well as the sampled random number stream u_1,u_2,\ldots— the simulation must always return the same output value, either always 0 or always 1. Output cannot vary depending on “hidden” additional factors outside the domain of the function, such as the system clock’s indication of the current time, etc.

Every input must map to an output: here we need to be careful about guaranteeing that our simulation will terminate. That is, in any particular practical execution of the simulations above, we never need the entire sequence of “available” random numbers (since otherwise it would run forever). For example, Alice’s simulation only ever depends on the first random draw u_1. Bob’s simulation, on the other hand, depends on an unbounded but finite number of draws that varies from one run to the next. That is, there is a positive probability that running the simulation could take an hour, or a week, or a year, or any specified amount of time, no matter how long… but there is zero probability that it will run forever.

But there are input sequences in the domain of the simulation function that correspond to this “run forever” behavior. For example, in Bob’s simulation, what if the marble selects the “left” move at every turn? Or alternates “left” and “right,” without ever selecting “stop”? For our function to be well-defined, we still need to assign some output to any and all of these infinite-length inputs in the domain, even if they have zero probability of occurring.

The good news is that we can assign any output we want… because it doesn’t matter! That is, no matter what questions we might ask about the statistical behavior of the output of a (terminating) simulation, the answers do not depend on our choice of assignment of outputs to those “impossible” outcomes.

More precisely, we can change our perspective slightly, and view our simulation function as a random variable Y on the sample space U^{\omega} with its natural probability measure, parameterized by the simulation mode:

Y_{x_1,x_2,\ldots,x_m}:U^{\omega} \to \{0,1\}

where we don’t care about the output from those “impossible” outcomes because they have measure zero. Note that this is essentially just a notational change, not a conceptual one: a random variable is neither random, nor a variable; it’s simply a function, just like we started with. This random variable has a Bernoulli distribution, whose parameter P(Y=1) we are trying to estimate… which we can do, no matter the “complexity” of the simulation realizing the function Y.


So, the answer to the question posed here– does Bob need more random samples than Alice to do his job– is no. The black boxes are black; we can’t peek inside them. They are effectively indistinguishable by anything other than their behavior. Similarly, a function implementing a simulation is characterized not by its “complexity” of “implementation,” but solely by its mapping of inputs to outputs.

I’m pretty sure I’ve already beaten this horse to death. But there are several other interesting wrinkles not discussed here. For example, we often want to actually no-kidding repeat exactly the same past simulation behavior (when debugging, for example); this capability is managed by using and recording seeds and/or substream indices as “shorthand” for the infinite sequences u_1,u_2,\ldots of input random numbers. Also, the input configuration parameters x_1,x_2,\ldots,x_m have so far been mostly just additional notational noise. But that configurability can lead to additional analysis pitfalls, when combining Monte Carlo observations of simulation runs across different values of those configuration parameters.


(*) When reporting on results of the analysis, why the frequentist Clopper-Pearson confidence interval instead of a more fashionable Bayesian approach of reporting, say, the (narrowest) highest density credible interval with, say, a uniform prior? Since those decisions weren’t really the point here, I chose the experimental setup carefully, so that it doesn’t matter: the resulting intervals, reported to two digits of precision, will be the same either way. If there is any safe place to be a frequentist, it’s on a computing cluster.

Posted in Uncategorized | 3 Comments

NCAA tournaments are getting measurably wilder

Last night, the #1 seed Kansas Jayhawks won the NCAA men’s basketball tournament. Maybe an unsurprising outcome… but they defeated #8 seed North Carolina in the championship game. Only four #8 seeds have made it that far in the nearly four decades of the current format.

This year’s tournament also saw #15 seed Saint Peter’s advance to their regional final, the first #15 seed ever to do so.

And there were plenty of other unlikely upsets as well. Considering the tournament as a whole, that is, all 63 games (ignoring the abomination of the four play-in games) and their seeding match-ups, how “wild” was this year’s outcome compared to past tournaments?

Or, put another way, what was the prior probability of picking a “perfect bracket,” guessing every game outcome correctly, and how does that probability compare with past years?

This question has been discussed here before; I won’t rehash the details of this past article that describes the details of methodology for modeling game probabilities. This post is really just bringing that analysis up to date to reflect recent tournaments (all source historical tournament data is available on GitHub). Here are the results:

Probability of a perfect bracket, 1985-2022.

The constant black line at the bottom reflects the 1 in 2^{63}, or 1 in 9.2 quintillion probability of guessing any year’s outcome correctly, if you simply flip a fair coin to determine the winner of each game. The constant blue and red lines at the top are for comparison with the actual outcomes, indicating the probability of a “chalk” bracket outcome, always picking the higher seeded team to win each match-up. (The blue and red colors reflect two different models of game probabilities as a function of difference in “strength” of each team; as discussed in the previous article, blue indicates a strength following a normal distribution density, and red indicates a linear strength function.)

By either the normal or linear models, three of the last four tournaments (2018, 2021, and 2022– 2019 was pretty by the book, and 2020 was, well, 2020) were among the top five “wildest,” most unlikely outcomes in the last four decades.

Posted in Uncategorized | 4 Comments

Bresenham’s circles are (always?) optimal


In my recent experiment with string art, I needed Bresenham’s line algorithm to rasterize a line, that is, to compute the set of pixels that most closely approximate a straight line segment. Suppose instead that we want to rasterize a circle, as shown in the figure below.

Bresenham’s algorithm used to rasterize a circle with radius 10. Integer lattice points are assumed to be at the center of each pixel.

There is a similarly named Bresenham’s circle algorithm to do this, that is arguably even simpler than that for drawing a line (code is available on GitHub):

// Call plot(_, _) to draw pixels on circle at origin with given radius.
void draw_circle(int radius, void(*plot)(int, int))
    int x = radius;
    int y = 0;
    int error = 3 - 2 * radius;
    while (x >= y)
        plot(x, y); plot(x, -y); plot(-x, y); plot(-x, -y);
        plot(y, x); plot(y, -x); plot(-y, x); plot(-y, -x);
        if (error > 0)
            error -= 4 * (--x);
        error += 4 * (++y) + 2;

I have a fond memory of my first encounter with this algorithm as a kid reading Nibble Magazine (“Seven ready-to-type Apple II programs!”). In his article, Brent Iverson [1] had “discovered a neat little circle-generating algorithm in a graphics textbook” … but neglected to mention which textbook, or the name Bresenham, or any other details about how this mysterious snippet of code (which was in BASIC, and in slightly different but equivalent form) worked, despite using only integer addition and bit shift operations. In 1988, without search engines or internet forums, I was on my own.

Today, there are derivations of this algorithm all over the web. However, every one that I have found glosses over what I think is a beautiful property of the algorithm: its fast integer error function is only an approximation of the actual error function, which seems like it could sacrifice some accuracy in choosing which pixels best approximate the desired circle. But it doesn’t; the motivation for this post is to show that (1) this is an issue worth thinking about, since the algorithm’s error function is not monotonic in actual distance from the circle, but (2) despite this, in practice Bresenham’s circle algorithm is still optimal in the sense of always selecting the next pixel that is indeed closest to the circle.

At least, I think. More on this later.

The algorithm

Let’s start by describing how the algorithm works, as a means of introducing some useful notation. Fixing a positive integer input radius r (the reader can verify that the algorithm also works for r=0), we can focus on rendering only those pixels in the first octant 0 \leq y \leq x, using the eightfold dihedral symmetries of the circle to plot the remaining pixels.

Starting at (x,y)=(r,0), at each iteration we always move up one pixel, and conditionally move (diagonally) left one pixel, choosing from the two corresponding lattice points (x-1,y+1) or (x,y+1) depending on which is closest to the circle. (We always move up, since in the first octant the slope of the tangent to the circle is always at most -1.)

An iteration of Bresenham’s circle algorithm. We have just plotted the pixel at (x,y), and need to choose which of (x-1,y+1) or (x,y+1) to plot next, ideally minimizing the actual distance from the circle (shown in red).

Ideally, we would choose the next pixel that minimizes the actual, exact distance from the circle |c(h,k)|, where


as indicated by the red line segments in the above figure. However, to eliminate those unpleasant square roots, Bresenham’s algorithm instead seeks to minimize |b(h,k)|, where



(To foreshadow the key point, note that b(h,k) is not simply the square of c(h,k).) In either case, it is convenient to define the sum of these signed errors– that is, without the absolute value– for the two candidate pixels as



where in the source code above, the error variable maintains the value of e_b(x,y), which is initialized to e_b(r,0)=3-2r, and at each iteration, depending on the choice of next pixel to render (more on this shortly), can be updated accordingly with simple integer increment, bit shift, and addition operations as shown.

Choosing the closest pixel

To decide which pixel to move to at each iteration, note that both the integer-valued b(h,k) and the exact c(h,k) have the property that they are positive for points (h,k) outside the circle, and negative for points inside the circle. Thus, for example, in the extreme (and in fact impossible) case that both of the candidate next points (x-1,y+1) and (x,y+1) were outside the circle, then e_b(x,y) and e_c(x,y) are positive, indicating that we should move diagonally left to (x-1,y+1). Similarly, if both candidate points were inside the circle, then the sum of errors is negative, indicating that we should move up to (x,y+1).

The important question is what happens in between, i.e., in the typical situation shown in the above figure, where the two candidate points straddle the circle, so to speak. Again, ideally we would use the sign of the exact sum of errors e_c(x,y), in the same way as described above, moving diagonally left if e_c(x,y)>0, or up if e_c(x,y)<0. Does Bresenham’s algorithm, using the sign of the approximated integer-valued e_b(x,y) instead, yield the same behavior?

To see why this is a question worth asking, suppose for example that we want to draw a circle with radius r=4, and that for some reason we are asked which of the points (2,2) or (4,3) is closer to the circle, as shown in the following figure.

Example choosing which of points (2,2) or (4,3) is closer to the circle with radius 4.

The reader can verify that (4,3) is closer, with |c(4,3)|<|c(2,2)| and thus c(2,2)+c(4,3)<0 … but |b(4,3)|>|b(2,2)| and thus b(2,2)+b(4,3)>0. That is, the integer error approximation used in Bresenham’s algorithm is not monotonic in the actual distance of points from the circle.

The interesting good news is that this seemingly potential flaw, isn’t. That is, we are now in a position to conjecture the following:

Conjecture: At each iteration of Bresenham’s circle algorithm, the sum-of-error functions e_c(x,y) and e_b(x,y) always have the same sign. That is, the algorithm always makes the correct choice of which of the two candidate next pixels is closest to the circle.

Incomplete proof: There are a few key observations that will be useful here. First, at each iteration, c(x,y+1)>c(x-1,y+1) and c^*(x,y+1)>c^*(x-1,y+1)>0. We can see these directly from the definitions, noting that r \geq x>0. (This is why we constrained the radius to be positive, treating r=0 as a separate special case.)

Also, note that the exact sum of errors e_c(x,y) is always irrational, and thus never zero. That is, we should never be indifferent to the choice of next pixel. The approximated e_b(x,y) is never zero, either, as can be seen by inspecting the source code: the value of error is initialized to an odd number, and is always incremented or decremented by even amounts. Thus, it suffices to show that e_c(x,y) and e_b(x,y) are either both positive, or both negative.

So, there are two cases. First, suppose that e_c(x,y)>0. Then c(x,y+1)>0, since




= c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x,y+1)

> c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x-1,y+1)

= e_c(x,y)c^*(x-1,y+1)>0.

Thus, when we are supposed to move diagonally left, we do.

The second case, showing that e_c(x,y)<0 implies e_b(x,y)<0, is why this is only conjecture and not a theorem. Following is as far as I have been able to proceed:

Similar to the first case, note that c(x-1,y+1)<0, since


Consider again the expanded form of


= c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x,y+1)

If c(x,y+1) \leq 0, then we can see e_b(x,y)<0 directly. Otherwise, if c(x,y+1)>0 … and here is where I get stuck.


Well, that was disappointing.

My glass is half full, though; I think there are three possible resolutions of this problem, any of which would be an interesting outcome. First, and most likely, I may have simply missed something obvious. Perhaps a reader can help take the above attempt at a proof to the finish line.

Alternatively, maybe the conjecture is actually false, and there is in fact a counterexample– which given the partial proof above, would necessarily be a situation where Bresenham’s algorithm is supposed to move up (e_c(x,y)<0), but moves diagonally left instead (e_b(x,y)>0). In realistic practice, this isn’t a concern: I have verified numerically that the algorithm works correctly for every radius up to r=16384— that is, much larger than any pixel-perfect circle that would fit on the highest resolution screens currently available. But even from a theoretical perspective, if there is a counterexample, it would be interesting that a minimal one must be so surprisingly large.

Or finally, maybe the conjecture is indeed true, but the remainder of the proof of the second “negative” case is actually really hard. There is some intuition behind this possibility as well: in this part of the proof, we are trying to bound a sum of square roots, which is one of my favorite examples of “problems that seem like they have no business being as difficult as they are.” (The Euclidean traveling salesman is perhaps the most recognizable decision problem with this wrinkle: it’s often claimed to be NP-complete, but it’s “only” NP-hard; we don’t even know if it’s in NP, since it involves a similar comparison with a sum of square roots.) This potentially affects the integrity of my brute-force numerical verification above as well: it’s technically possible that I missed a counterexample by performing those thousands of sign checks with “only” 50-digit precision.

Edit 2022-03-21: Thanks to commenters Andrey Khalyavin for the initial proof of the contrapositive of the missing half above, which I didn’t fully grasp, and to Linus Hamilton for patiently following up with a more detailed version, linked here, that helped me to see how it works. Their work shows that Bresenham’s algorithm is indeed optimal, always selecting the unique next pixel that is closest to the desired circle.


  1. Iverson, Brent, Hi-Res Circle Generator, Nibble Magazine, 9(1) January 1988, 68-71
Posted in Uncategorized | 13 Comments

String art


This is a variant of a similar past problem: draw something interesting, using a sequence of joined straight line segments, without ever lifting your pen. Or in this case, with one continuous thread.

As far as I can tell, the first realization of this particular idea was in 2016, when artist Petros Vrellis [1] arranged 200 pins evenly spaced around a 28-inch-diameter circular rim, and wound kilometers of thread stretched in straight line chords from one pin to the next, in a carefully determined order, to yield an image similar to the one shown below (created by me).

My implementation of string art, with one continuous thread wound around 256 pins across 8,927 chords. Full size output image is 4096×4096 black on white, from 512×512 grayscale input (source Birsak et. al. [2]).

Vrellis has not published any details of their algorithm, but since then, numerous others have done similar work. The most common approach has been a sequential greedy algorithm, starting with a grayscale input image and a blank white output image, rendering each thread chord in the output as a pixel-approximated line segment, at each iteration selecting a next pin so that the resulting chord makes the most improvement to the result.

Most of these approaches had a quirk that didn’t appeal to me: they effectively treat the thread as if it were not actually opaque, so that overlapping thread chords increase the darkness of the corresponding pixels in the output image (or in some cases, reduce the “remaining” darkness of the pixels in the input as each new thread overlaps them). That is, dark thread on top of dark thread yields an even darker intersection.

The motivation for this post is to explore the alternative described in 2018 by Birsak et. al. [2], who instead upscaled the input image to a higher-resolution output, with each input pixel corresponding to a b \times b square block of pixels in the output. When one or more threads– each one output pixel wide, purely black-on-white– intersect a block, the fraction of those b^2 pixels that are black is a more accurate approximation of the gray level that we can compare with the corresponding single input pixel.

I thought it would be interesting to implement just this upscaling idea, but to otherwise skip the complexity of the other aspects of their algorithm (involving iterative solving of an integer programming problem, extracting an Eulerian trail from the resulting unordered set of chords, weighting more important regions of the target image, etc.), sticking to the simple sequential greedy approach of selecting consecutive chords. I was pleased with the results: less than 200 lines of C++ code (available on GitHub), that generates nice-looking images in just a few minutes.


The program takes four arguments: the input image filename, the upscaling block size, the number of pins, and the output image filename. For example, to use 16×16-pixel output blocks per input pixel, with 256 pins:

string_art input.pgm 16 256 output.pgm

Image files use the easy-to-parse Netpbm format, specifically the 8-bit grayscale binary “P5” format.

Initially, the output image “canvas” is all white. Starting at an arbitrary pin, we iteratively select the best next pin to wind the thread across. For each candidate chord, we use Bresenham’s algorithm to visit each pixel on the chord. The score for that chord is simply the average– over each of the intersected blocks that would change as a result– of the difference between the target gray level of the input pixel and the (potential) new fraction of black pixels in the block. Intuitively, we measure how important it is to continue to darken the intersected blocks of pixels.

Evaluating the score for each possible next chord, we select the one with maximum score, that is, that makes the largest average improvement to the blocks it would modify. We stop when that best average improvement is negative, that is, when we would make the image worse by continuing.


The figure below shows the results for several example images. The top row is the 512×512 grayscale input (source Birsak [2]). The middle row is the Birsak algorithm output, using a block size b=8 (and thus a 4096×4096 output resolution) and 256 pins. The bottom row is my implementation using the same settings. In each case, the label indicates the number of thread chords, and the total length of thread required (for a 630-mm diameter canvas).

String art comparison between (a) top row input images, 512×512 grayscale; (b) middle row Birsak et. al., output, 4096×4096 with 8×8 blocks and 256 pins; and (c) bottom row showing my algorithm with same parameters. Labels indicate the number of chords and total thread length in meters (on 630-mm diameter canvas).

For my own project, I used the same 256 pins, but a 384×384 input image, and a higher resolution block size b=16, for a 6144×6144 output image, which looks nice as a 24-inch, 300-dpi print, with the target input image inset as shown below.

Have you ever tried to photograph a cat?


  1. Vrellis, Petros, A new way to knit (2016). http://artof01.com/vrellis/works/knit.html
  2. Birsak, M., Rist, F., Wonka, P., & Musialski, P., String Art: Towards Computational Fabrication of String Images, Computer Graphics Forum, 37(2) 2018, 263–274, doi:10.1111/cgf.13359
Posted in Uncategorized | 4 Comments

There is no equilux

I learned a new word recently… sort of. An equilux is a day on which the durations of daylight and night are exactly equal. Contrast this with the more familiar equinox, or the time at which the center of the Sun passes through the Earth’s equatorial plane, so that the Sun is directly above the Earth’s equator. (If anyone is checking, the official definition of the equinox is slightly different than this, for technical reasons that we don’t really care about here.)

There are two equinoxes each year; for example, this coming year the first equinox of 2022 is on 20 March, at about 15:33:25 UTC, marking the unofficial start of spring in the northern hemisphere (or fall in the southern hemisphere).

We think of an equinox as also being the time at which there are roughly equal periods of daylight and darkness. The Wikipedia page linked above does a good job of explaining why this isn’t quite true: the Sun appears as not just a point but a disk, a sliver of which becomes visible at sunrise well before the center of the Sun appears above the horizon, and similarly a sliver remains visible in the evening after the center has dipped back below the horizon again.

Thus, even on the equinox, the day is slightly longer than the night. So, when is the “equilux,” where these periods of day and night are truly equal?

To see why this is tricky to answer, consider the figure below, showing the changes in length of day and night over the course of the coming year in the Washington, D.C., area:

Length of day and night in Washington, D.C., over the course of the year 2022. The inset is a zoomed-in view of the week near the spring equinox on 20 March.

This winter, the nights (shown in blue) are much longer than the days (shown in red). But over the next few months, the nights get shorter and the days get longer… until the middle of March, when we start to see more daylight hours than nighttime. This “crossing” happens around 16 March (see the zoomed-in view in the inset), when we see almost exactly 12 hours (plus a few seconds) of night beginning at sunset that evening, followed by about 12 hours– plus almost a full minute— of daylight on 17 March.

The motivation for this post is to observe that (1) this is still not exactly equal periods of daylight and darkness, still differing by nearly a minute; and that (2) such an exact equilux will almost surely never occur. An equinox is a precise instant in time– and the same instant independent of observer location on the Earth– that we can in principle measure as accurately as we wish. But an equilux, on the other hand, is two consecutive intervals of time, whose lengths are varying over the weeks surrounding the equinox by several minutes per day. Hoping that those lengths will meet in the middle exactly is hoping for a measure zero event.

Having said that, we can relax our definition somewhat, and consider when periods of daylight and darkness are “most closely equal,” minimizing the absolute difference between consecutive intervals as we did above. Things are still pretty unpleasant, for a couple of reasons. First, the times, and thus durations, of daylight and darkness depend on the location of the observer. The calculations and figure above are unique to Washington, D.C.; if we instead consider southern Virginia (or Oklahoma, or most of California), the nearest equilux is earlier, consisting of the daylight hours of 16 March followed by that night, as shown in the figure below.

Date of (spring) equilux across the globe. Equilux is earlier than the spring equinox (20 March) in the northern hemisphere, later in the southern hemisphere. Lighter colors indicate the equilux starts with daytime (sunrise to sunset) of the indicated date, followed by nighttime; darker colors indicate starting with evening of the date, followed by daytime of the following day.

I struggled a bit to make this figure clear and informative at the same time. In the northern hemisphere, the spring equilux is before the 20 March equinox, by a number of days indicated by the color. For the southern hemisphere, I “re-used” the same color palette, indicating the number of days after the equinox that the equilux occurs. In either case, the lighter color indicates starting with daytime (sunrise to sunset) of the indicated date, followed by the roughly equal nighttime; the darker color indicates that the equilux starts with evening of the indicated date, followed by daytime of the following day.

The light purple and green bands near the equator more coarsely indicate equiluxes that are more than a week before or after the equinox, respectively. Which brings us to the second problem with everything we’ve done so far: an equilux depends on how we choose to define the transition between what we consider “daylight” versus “darkness.” The sunrise and sunset used above are defined as the times when the center of the Sun’s disk is 50 arcminutes below the horizon. It’s arguably way too light at these times to call them a transition to or from “darkness.” A more reasonable transition would be at least “dawn” and “dusk,” defined as civil twilight (6 degrees below the horizon), if not nautical or even astronomical twilight (12 or 18 degrees, respectively). The figure below compares the sunrise/sunset equilux (shown in red) with the dawn/dusk civil twilight assumption (shown in blue), as a function of the observer’s latitude.

Date of equilux vs. observer latitude, assuming “day/night” transitions at sunrise/sunset (in red) or dawn/dusk twilight (in blue). The equinox on 20 March is shown in black for reference.

So, depending on how dark we think it needs to be to qualify as “not daylight,” the spring equilux in the mid-northern latitudes where I live could be as late as a few days before the equinox, or as early as mid-February. And however we choose to define it, the duration of daylight and darkness will still differ by a minute or three.

Posted in Uncategorized | Leave a comment

Squid Game expectation

Warning: This post is motivated by the Netflix series Squid Game, and contains significant spoilers. I just recently watched the show, and really enjoyed it, so if you haven’t yet seen it, you may want to skip this article.

Suppose that you are one of n=16 competitors who must cross a bridge of s=18 steps. The steps are far enough apart that you must jump from one to the next… and at each step you must choose which of two side-by-side glass squares to jump onto: one square is tempered glass that is strong enough for you to safely jump onto, while the other square is normal glass that will break if you step on it, causing you to fall to your death and out of the competition. The glass squares are otherwise indistinguishable, so that players must guess which square to jump onto at each new step.

Each player in turn attempts to cross the bridge. The first player has a tough road ahead, safely crossing with a probability of only 1/2^{18}. But each subsequent player observes the progress of all previous players, noting which squares at each “explored” step are safe and which are deadly.

Last week’s FiveThirtyEight Riddler column asked for the expected number Y of surviving players. Let’s turn this around, and instead consider the expected number X=n-Y of eliminated players, since the analysis and resulting formulas turn out a bit nicer that way.

I think this is a nice problem for several reasons. It can be an engaging problem for students as a, uh, “real-world application” of probability and expected value. It’s easy to simulate as a programming exercise; and there are several analytical approaches to solving the problem as well. The Riddler solutions describe a recurrence relation, as well as a relatively nice summation:

E[X] = n - \frac{1}{2^{s}}\sum\limits_{k=0}^n (n-k){s \choose k}

yielding 589819/65536, or 8.9999237060546875 players eliminated on average.

But the fact that the expected value is so close to 9– so close to half the number of steps– is not a coincidence. This seems like a great example application of linearity of expectation, that in this case allows us to solve the problem with just paper and pencil.

Let X_1, X_2, \ldots, X_s be indicator random variables, each equal to 1 if and only if some player is eliminated at the corresponding step, so that X=\sum X_j, and thus

E[X] = \sum\limits_{j=1}^s E[X_j] = \sum\limits_{j=1}^s P(X_j=1) .

The key observation is that “most” of these probabilities are exactly 1/2. More precisely, as long as j \leq n, someone must encounter step j, and whoever does encounter that step for the first time will be eliminated with probability P(X_j=1)=1/2.

So if there are at least as many players as steps, i.e. n \geq s, then the expected number of players eliminated is exactly E[X]=s/2.

In the actual game, however, there are two “extra” steps. As described above, the first 16 steps contribute exactly half a player each to the total expected number eliminated. But the 17th and 18th steps require more careful analysis, because it is possible that one or both of these steps may not be encountered at all.

For example, consider the worst possible outcome, where the first player is eliminated on the first step, the second player on the second step, and so on, with the 16th and final player being eliminated on the 16th step, so that the 17th (and 18th) step is never reached. This happens with probability 1/2^{16}, and so

P(X_{17}=1) = (1-\frac{1}{2^{16}})(\frac{1}{2})

We handle the 18th step similarly. In addition to the above situation where neither the 17th nor 18th step is reached, there are 16 equally likely ways for the last player to be eliminated on the 17th step (according to which one of the first 16 steps is guessed correctly by some player), so that

P(X_{18}=1) = (1-\frac{1}{2^{16}}-\frac{16}{2^{17}})(\frac{1}{2})

Adding to the 16/2=8 expected eliminations from the first 16 steps yields the desired E[X]=589819/65536.

Granted, it is fortunate that there are only two more steps than players, because the resulting general formula for the expected number of players eliminated:

E[X] = \frac{1}{2}\sum\limits_{j=1}^s (1-\frac{1}{2^{j-1}}\sum\limits_{k=n}^{j-1}{j-1 \choose k})

looks more complicated than the summation presented earlier.

Posted in Uncategorized | Leave a comment