String art


This is a variant of a similar past problem: draw something interesting, using a sequence of joined straight line segments, without ever lifting your pen. Or in this case, with one continuous thread.

As far as I can tell, the first realization of this particular idea was in 2016, when artist Petros Vrellis [1] arranged 200 pins evenly spaced around a 28-inch-diameter circular rim, and wound kilometers of thread stretched in straight line chords from one pin to the next, in a carefully determined order, to yield an image similar to the one shown below (created by me).

My implementation of string art, with one continuous thread wound around 256 pins across 8,927 chords. Full size output image is 4096×4096 black on white, from 512×512 grayscale input (source Birsak et. al. [2]).

Vrellis has not published any details of their algorithm, but since then, numerous others have done similar work. The most common approach has been a sequential greedy algorithm, starting with a grayscale input image and a blank white output image, rendering each thread chord in the output as a pixel-approximated line segment, at each iteration selecting a next pin so that the resulting chord makes the most improvement to the result.

Most of these approaches had a quirk that didn’t appeal to me: they effectively treat the thread as if it were not actually opaque, so that overlapping thread chords increase the darkness of the corresponding pixels in the output image (or in some cases, reduce the “remaining” darkness of the pixels in the input as each new thread overlaps them). That is, dark thread on top of dark thread yields an even darker intersection.

The motivation for this post is to explore the alternative described in 2018 by Birsak et. al. [2], who instead upscaled the input image to a higher-resolution output, with each input pixel corresponding to a b \times b square block of pixels in the output. When one or more threads– each one output pixel wide, purely black-on-white– intersect a block, the fraction of those b^2 pixels that are black is a more accurate approximation of the gray level that we can compare with the corresponding single input pixel.

I thought it would be interesting to implement just this upscaling idea, but to otherwise skip the complexity of the other aspects of their algorithm (involving iterative solving of an integer programming problem, extracting an Eulerian trail from the resulting unordered set of chords, weighting more important regions of the target image, etc.), sticking to the simple sequential greedy approach of selecting consecutive chords. I was pleased with the results: less than 200 lines of C++ code (available on GitHub), that generates nice-looking images in just a few minutes.


The program takes four arguments: the input image filename, the upscaling block size, the number of pins, and the output image filename. For example, to use 16×16-pixel output blocks per input pixel, with 256 pins:

string_art input.pgm 16 256 output.pgm

Image files use the easy-to-parse Netpbm format, specifically the 8-bit grayscale binary “P5” format.

Initially, the output image “canvas” is all white. Starting at an arbitrary pin, we iteratively select the best next pin to wind the thread across. For each candidate chord, we use Bresenham’s algorithm to visit each pixel on the chord. The score for that chord is simply the average– over each of the intersected blocks that would change as a result– of the difference between the target gray level of the input pixel and the (potential) new fraction of black pixels in the block. Intuitively, we measure how important it is to continue to darken the intersected blocks of pixels.

Evaluating the score for each possible next chord, we select the one with maximum score, that is, that makes the largest average improvement to the blocks it would modify. We stop when that best average improvement is negative, that is, when we would make the image worse by continuing.


The figure below shows the results for several example images. The top row is the 512×512 grayscale input (source Birsak [2]). The middle row is the Birsak algorithm output, using a block size b=8 (and thus a 4096×4096 output resolution) and 256 pins. The bottom row is my implementation using the same settings. In each case, the label indicates the number of thread chords, and the total length of thread required (for a 630-mm diameter canvas).

String art comparison between (a) top row input images, 512×512 grayscale; (b) middle row Birsak et. al., output, 4096×4096 with 8×8 blocks and 256 pins; and (c) bottom row showing my algorithm with same parameters. Labels indicate the number of chords and total thread length in meters (on 630-mm diameter canvas).

For my own project, I used the same 256 pins, but a 384×384 input image, and a higher resolution block size b=16, for a 6144×6144 output image, which looks nice as a 24-inch, 300-dpi print, with the target input image inset as shown below.

Have you ever tried to photograph a cat?


  1. Vrellis, Petros, A new way to knit (2016).
  2. Birsak, M., Rist, F., Wonka, P., & Musialski, P., String Art: Towards Computational Fabrication of String Images, Computer Graphics Forum, 37(2) 2018, 263–274, doi:10.1111/cgf.13359
Posted in Uncategorized | 2 Comments

There is no equilux

I learned a new word recently… sort of. An equilux is a day on which the durations of daylight and night are exactly equal. Contrast this with the more familiar equinox, or the time at which the center of the Sun passes through the Earth’s equatorial plane, so that the Sun is directly above the Earth’s equator. (If anyone is checking, the official definition of the equinox is slightly different than this, for technical reasons that we don’t really care about here.)

There are two equinoxes each year; for example, this coming year the first equinox of 2022 is on 20 March, at about 15:33:25 UTC, marking the unofficial start of spring in the northern hemisphere (or fall in the southern hemisphere).

We think of an equinox as also being the time at which there are roughly equal periods of daylight and darkness. The Wikipedia page linked above does a good job of explaining why this isn’t quite true: the Sun appears as not just a point but a disk, a sliver of which becomes visible at sunrise well before the center of the Sun appears above the horizon, and similarly a sliver remains visible in the evening after the center has dipped back below the horizon again.

Thus, even on the equinox, the day is slightly longer than the night. So, when is the “equilux,” where these periods of day and night are truly equal?

To see why this is tricky to answer, consider the figure below, showing the changes in length of day and night over the course of the coming year in the Washington, D.C., area:

Length of day and night in Washington, D.C., over the course of the year 2022. The inset is a zoomed-in view of the week near the spring equinox on 20 March.

This winter, the nights (shown in blue) are much longer than the days (shown in red). But over the next few months, the nights get shorter and the days get longer… until the middle of March, when we start to see more daylight hours than nighttime. This “crossing” happens around 16 March (see the zoomed-in view in the inset), when we see almost exactly 12 hours (plus a few seconds) of night beginning at sunset that evening, followed by about 12 hours– plus almost a full minute— of daylight on 17 March.

The motivation for this post is to observe that (1) this is still not exactly equal periods of daylight and darkness, still differing by nearly a minute; and that (2) such an exact equilux will almost surely never occur. An equinox is a precise instant in time– and the same instant independent of observer location on the Earth– that we can in principle measure as accurately as we wish. But an equilux, on the other hand, is two consecutive intervals of time, whose lengths are varying over the weeks surrounding the equinox by several minutes per day. Hoping that those lengths will meet in the middle exactly is hoping for a measure zero event.

Having said that, we can relax our definition somewhat, and consider when periods of daylight and darkness are “most closely equal,” minimizing the absolute difference between consecutive intervals as we did above. Things are still pretty unpleasant, for a couple of reasons. First, the times, and thus durations, of daylight and darkness depend on the location of the observer. The calculations and figure above are unique to Washington, D.C.; if we instead consider southern Virginia (or Oklahoma, or most of California), the nearest equilux is earlier, consisting of the daylight hours of 16 March followed by that night, as shown in the figure below.

Date of (spring) equilux across the globe. Equilux is earlier than the spring equinox (20 March) in the northern hemisphere, later in the southern hemisphere. Lighter colors indicate the equilux starts with daytime (sunrise to sunset) of the indicated date, followed by nighttime; darker colors indicate starting with evening of the date, followed by daytime of the following day.

I struggled a bit to make this figure clear and informative at the same time. In the northern hemisphere, the spring equilux is before the 20 March equinox, by a number of days indicated by the color. For the southern hemisphere, I “re-used” the same color palette, indicating the number of days after the equinox that the equilux occurs. In either case, the lighter color indicates starting with daytime (sunrise to sunset) of the indicated date, followed by the roughly equal nighttime; the darker color indicates that the equilux starts with evening of the indicated date, followed by daytime of the following day.

The light purple and green bands near the equator more coarsely indicate equiluxes that are more than a week before or after the equinox, respectively. Which brings us to the second problem with everything we’ve done so far: an equilux depends on how we choose to define the transition between what we consider “daylight” versus “darkness.” The sunrise and sunset used above are defined as the times when the center of the Sun’s disk is 50 arcminutes below the horizon. It’s arguably way too light at these times to call them a transition to or from “darkness.” A more reasonable transition would be at least “dawn” and “dusk,” defined as civil twilight (6 degrees below the horizon), if not nautical or even astronomical twilight (12 or 18 degrees, respectively). The figure below compares the sunrise/sunset equilux (shown in red) with the dawn/dusk civil twilight assumption (shown in blue), as a function of the observer’s latitude.

Date of equilux vs. observer latitude, assuming “day/night” transitions at sunrise/sunset (in red) or dawn/dusk twilight (in blue). The equinox on 20 March is shown in black for reference.

So, depending on how dark we think it needs to be to qualify as “not daylight,” the spring equilux in the mid-northern latitudes where I live could be as late as a few days before the equinox, or as early as mid-February. And however we choose to define it, the duration of daylight and darkness will still differ by a minute or three.

Posted in Uncategorized | Leave a comment

Squid Game expectation

Warning: This post is motivated by the Netflix series Squid Game, and contains significant spoilers. I just recently watched the show, and really enjoyed it, so if you haven’t yet seen it, you may want to skip this article.

Suppose that you are one of n=16 competitors who must cross a bridge of s=18 steps. The steps are far enough apart that you must jump from one to the next… and at each step you must choose which of two side-by-side glass squares to jump onto: one square is tempered glass that is strong enough for you to safely jump onto, while the other square is normal glass that will break if you step on it, causing you to fall to your death and out of the competition. The glass squares are otherwise indistinguishable, so that players must guess which square to jump onto at each new step.

Each player in turn attempts to cross the bridge. The first player has a tough road ahead, safely crossing with a probability of only 1/2^{18}. But each subsequent player observes the progress of all previous players, noting which squares at each “explored” step are safe and which are deadly.

Last week’s FiveThirtyEight Riddler column asked for the expected number Y of surviving players. Let’s turn this around, and instead consider the expected number X=n-Y of eliminated players, since the analysis and resulting formulas turn out a bit nicer that way.

I think this is a nice problem for several reasons. It can be an engaging problem for students as a, uh, “real-world application” of probability and expected value. It’s easy to simulate as a programming exercise; and there are several analytical approaches to solving the problem as well. The Riddler solutions describe a recurrence relation, as well as a relatively nice summation:

E[X] = n - \frac{1}{2^{s}}\sum\limits_{k=0}^n (n-k){s \choose k}

yielding 589819/65536, or 8.9999237060546875 players eliminated on average.

But the fact that the expected value is so close to 9– so close to half the number of steps– is not a coincidence. This seems like a great example application of linearity of expectation, that in this case allows us to solve the problem with just paper and pencil.

Let X_1, X_2, \ldots, X_s be indicator random variables, each equal to 1 if and only if some player is eliminated at the corresponding step, so that X=\sum X_j, and thus

E[X] = \sum\limits_{j=1}^s E[X_j] = \sum\limits_{j=1}^s P(X_j=1) .

The key observation is that “most” of these probabilities are exactly 1/2. More precisely, as long as j \leq n, someone must encounter step j, and whoever does encounter that step for the first time will be eliminated with probability P(X_j=1)=1/2.

So if there are at least as many players as steps, i.e. n \geq s, then the expected number of players eliminated is exactly E[X]=s/2.

In the actual game, however, there are two “extra” steps. As described above, the first 16 steps contribute exactly half a player each to the total expected number eliminated. But the 17th and 18th steps require more careful analysis, because it is possible that one or both of these steps may not be encountered at all.

For example, consider the worst possible outcome, where the first player is eliminated on the first step, the second player on the second step, and so on, with the 16th and final player being eliminated on the 16th step, so that the 17th (and 18th) step is never reached. This happens with probability 1/2^{16}, and so

P(X_{17}=1) = (1-\frac{1}{2^{16}})(\frac{1}{2})

We handle the 18th step similarly. In addition to the above situation where neither the 17th nor 18th step is reached, there are 16 equally likely ways for the last player to be eliminated on the 17th step (according to which one of the first 16 steps is guessed correctly by some player), so that

P(X_{18}=1) = (1-\frac{1}{2^{16}}-\frac{16}{2^{17}})(\frac{1}{2})

Adding to the 16/2=8 expected eliminations from the first 16 steps yields the desired E[X]=589819/65536.

Granted, it is fortunate that there are only two more steps than players, because the resulting general formula for the expected number of players eliminated:

E[X] = \frac{1}{2}\sum\limits_{j=1}^s (1-\frac{1}{2^{j-1}}\sum\limits_{k=n}^{j-1}{j-1 \choose k})

looks more complicated than the summation presented earlier.

Posted in Uncategorized | Leave a comment

Code name generator

Several months ago I updated the list of words and frequencies of occurrence that I’ve used in various natural language processing experiments (keyword search “ngrams”) over the years, to reflect last year’s update to the Google Books Ngrams dataset.

This past weekend I updated it again, to include the words in the WordNet parts of speech database developed by Princeton University, restricting to just those 63,745 terms consisting of single words of all lowercase letters, i.e., no capitalization, spaces, hyphens, or other punctuation.

Most– over 98%– of these words were already in the list. But including them lets us also use their categorization into parts of speech (nouns, verbs, adjectives, and adverbs), to automate the generation of random code names for your next super secret project.

Just select a random adjective, followed by a random noun, and you get Operation ALIEN WORK, or DISSENTING CITY, or one of my favorites, OCCASIONAL ENEMY. Weighting the random selections by word frequency keeps the code names reasonably simple, so that you don’t end up with Operation PARABOLOIDAL IMMIGRATION (although an unweighted sample did yield Operation YONDER KING, which sounds pretty cool).

The Python code name generator and word lists of parts of speech and corresponding frequencies are available on GitHub.


  1. Google Books Ngram Viewer Exports, English versions 20120701 and 20200217
  2. Princeton University “About WordNet.” WordNet. Princeton University. 2011.
Posted in Uncategorized | 3 Comments

Expected length of a soccer penalty shootout

The U.S. women’s soccer team recently beat the Netherlands in a penalty shootout, soccer’s version of an overtime tie-breaker. Teams alternate turns attempting a penalty kick, resulting in either a scored point or a block/miss, and after each team has had five attempts, the most points wins… with two twists:

  1. The shootout ends as soon as the outcome is guaranteed, so that the shootout may require fewer than ten kicks in total. This happened in the U.S. vs. Netherlands game: the U.S. blocked two of the first four kicks from the Netherlands, and so after making all of their first four shots, the U.S. victory was assured after just eight total shots.
  2. If the score is still tied after each team has had five shots, the shootout continues until “sudden death,” with teams alternating penalty kicks until one team scores and the other misses in the same “round.” This could go on for a while: the current world record for the longest shootout in a first class match is 48 shots (the ten initial shots, plus 19 more rounds of sudden death) in the 2005 Namibian Cup.

A recent Riddler column at FiveThirtyEight asked how long this takes on average. That is, assuming that each penalty kick by either team scores independently with the same probability p, what is the expected total number of shots taken?

This post is mostly just to capture my notes on this problem, adding to my “library” of potentially interesting future exercises for students. Following is a Python implementation that uses the symbolic mathematics library SymPy (bundled with SciPy) to recursively compute the desired expected value as a function of p:

from functools import lru_cache
import sympy as sym

def shots(score, us, them, p):
    """E[# shots] given net score and remaining shots for us/them."""
    if us + them == 0:
        return 0 if score != 0 else 2 / (2 * p * (1 - p))
    if score + us < 0 or score - them > 0:
        return 0
    return (1 + p * shots(-score - 1, them, us - 1, p) +
            (1 - p) * shots(-score, them, us - 1, p))

print(shots(0, 5, 5, sym.Symbol('p')).simplify())

The early stopping condition in line 9 is simple to describe: we stop if the next team to shoot can’t possibly win, or can’t possibly lose.

I like this problem because there are several nice subproblems buried inside. For example, with a bit of work, we could consolidate the us and them function arguments– indicating the number of possible remaining shots for the next team to kick and the defending team, respectively– into a single argument indicating the total number of shots remaining before sudden death. (It’s a fair question whether this is really any clearer or more readable, though; I think the above implementation makes it easier to see how the kicking and defending teams switch places in the recursion.)

Also, as noted in the Riddler column, it’s interesting, and a nice exercise to prove, that the expected number of shots is symmetric in p and 1-p, as shown in the figure below:

In other words, for example, if teams score penalty kicks with probability 0.8, we should expect shootouts to take just as long on average as if teams score with probability 0.2; and shootouts are shortest when each kick is a fair coin flip.

Finally, instead of just the expected number of shots, consider computing the entire probability distribution. Or even just counting outcomes (that are not in general equally likely): for example, how many different outcomes of the first n=5 “guaranteed” rounds (i.e., the first ten shots) are there that result in a tie leading to sudden death? This number is {2n \choose n}; it’s a nice problem to find a counting argument to show this (e.g., ten misses is one such outcome, while ten scored points is another, etc.).

Posted in Uncategorized | Leave a comment

Counting collisions

Here is an interesting problem that I saw recently, that involves a nice combination of physics, programming, and mathematics, with a surprising solution.

Imagine two balls on a frictionless floor near a wall, as shown in the figure below. The green ball is initially at rest between the wall and the blue ball, which is m times more massive and moving toward the green ball. Assume everything is nicely Newtonian, and collisions are perfectly elastic.

As a function of m, how many collisions are there?

For example, if m=1, so that the two balls have equal mass, then there are three collisions: the blue ball strikes the green ball, which bounces off the wall, then returns to strike the blue ball again, sending it off to the right.

However, if the blue ball is heavier, say m=100 times heavier, then the green ball will bounce back and forth many times between the wall and the blue ball, for a total of 31 collisions, before finally failing to catch up to the blue ball as it moves off to the right.

What if the blue ball is m=10,000 times heavier? Or one million times?

Following is my Python implementation of a solution.

def collisions(m):
    v1, v2 = 0, -1
    bounce = 0
    while v1 > v2:
        v1, v2 = (((1 - m)*v1 + 2*m*v2) / (1 + m),
                  (2*v1 - (1 - m)*v2) / (1 + m))
        bounce = bounce + 1
        if v1 < 0:
            v1 = -v1
            bounce = bounce + 1
    return bounce

Posted in Uncategorized | 2 Comments

Beware the “natural” quaternion


Rotation math can be confusing. But it didn’t need to be this confusing.

I think the reason that 3D rotations can be tricky to work with is that there are so many choices of convention– in interpretation, notation, and implementation– and we often do not communicate or document all of those choices clearly and completely. Are we “actively” rotating vectors, or “passively” rotating frames? If the latter, are we rotating from the “body” to “world” frame, or from world to body? Are rotations applied as we write them left to right, or right to left? Are the rotations right-handed or left-handed? Etc.

This post is motivated by lost and delayed productivity, stemming from confusion and miscommunication about one of these choices that seems not widely known: there are two different quaternions out there in the wild. The unit quaternion, familiar to most of us as a way to represent rotations in many application domains, has an evil twin. If you didn’t already know this, then hopefully this post is a helpful warning to double-check your documentation and code. But even if this is old news for you, then I will try to argue that use of this evil twin quaternion, advocated by Shuster [1] as the “natural” convention, was an unnecessary choice… and certainly not any more “natural,” “physical,” or otherwise inherently preferable to the quaternion most of us are familiar with.

Rotation matrices

To try to show more clearly how silly this situation is, let’s think like a software developer, working out how to implement rotations for some application. Let’s start with the interface first, then we will flesh out the implementation. We can view the concept of rotation as a group acting on \mathbb{R}^3, with a particular rotation represented as a group element g, that transforms a vector v \in \mathbb{R}^3 into a new rotated vector g(v). For example, in Python, we could make a Rotation object callable:

class Rotation:
    def __call__(self, v):
        """Rotate vector v."""
        return ...

g = Rotation(...)
v = (1, 0, 0)
v_rotated = g(v)

But wait– we’ve already made several choices of convention here, at least implicitly. First, from this software implementation perspective, we are bypassing the discussion of interpretation altogether. That is, for example, whether the user wants to interpret an argument vector v \in \mathbb{R}^3 as being “actively” rotated within a fixed coordinate frame, or as a fixed point in space whose coordinates are being “passively” rotated from one frame to another, does not affect the eventual lines of code implementing the arithmetic operations realizing the rotation. (Or put another way, those interpretation details get pushed into the constructor, affecting how we want to interpret the parameters specifying creation of a Rotation. More on this later.)

However, we have made a definite choice about notation: namely, the left group action convention, where we write (from left to right) the rotation group element g first, followed by the vector v being acted upon.

This is okay; no one is complaining about this choice of convention. For example, if we now consider actually implementing a rotation as a 3×3 matrix R (we’ll get to quaternions shortly), pretty much everyone agrees with and has settled on the well-established convention of multiplying the matrix R on the left by the column vector v on the right, so that the group action g(v) corresponds to the matrix multiplication Rv:

import numpy as np

class Rotation:
    def __init__(self, matrix):
        """Create rotation from matrix."""
        self.r = np.array(matrix)
    def __call__(self, v):
        """Rotate vector v."""
        return self.r @ np.array(v) # matrix multiplication

g = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
v = (1, 0, 0)
v_rotated = g(v)

(As an aside, the only reasonably common application context I can think of where it’s more common to multiply a row vector on the left by a matrix on the right is a Markov chain, where we update the probability distribution as a row vector multiplied by the transition matrix. Maybe there are others that I’m not aware of?)

Composition of rotations

Our implementation isn’t quite complete. We still need to work out how to compose multiple rotations. Fortunately, there is again community-wide agreement on how this works with rotation matrices, and it’s consistent with the same left group action convention. Namely, composition (just like the group action on vectors) is denoted and implemented as the usual left-to-right matrix multiplication:

import numpy as np

class Rotation:
    def __init__(self, matrix):
        """Create rotation from matrix."""
        self.r = np.array(matrix)
    def __call__(self, v):
        """Rotate vector v."""
        return self.r @ np.array(v)

    def __matmul__(self, r2):
        """Compose rotations (this @ r2)."""
        return Rotation(self.r @ r2.r)

r1 = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
r2 = Rotation(((0, 0, 1), (0, 1, 0), (-1, 0, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result

Note a couple of observations before we move on. First, the example above highlights a potentially confusing side effect of this left group action convention: in the composition R_1 R_2, the rightmost rotation R_2 is the first to act on the input vector v.

Second, recall that matrix multiplication is defined so that each element in the matrix product C=AB is computed as c_{i,j}=\sum_k a_{i,k}b_{k,j}. This is technically another arbitrary choice of convention; we could in principle choose to abandon this convention, and define “our” multiplication as c_{i,j}=\sum_k a_{k,i}b_{j,k}. The reader can verify that the effect of changing this convention is simply to reverse the order in which we would write a “historically conventional” well-defined matrix product.

Maybe we’ve been using the “wrong” definition of matrix multiplication for the last two centuries or so, and the above redefinition is more “natural”?

Keep the interface, change the implementation

Now that we have a working software implementation using rotation matrices, that corresponds nicely with cocktail-napkin notation, suppose that we want to change that implementation to use quaternions instead of matrices under the hood… but without changing the interface exposed to the user.

First, let’s review how unit quaternions work as rotation operators. Just as the group SO(3) of rotation matrices acts on vectors by multiplication, the group SU(2) of unit quaternions of the form q=w+x\mathbf{i}+y\mathbf{j}+z\mathbf{k} acts on vectors by conjugation, with a unit quaternion q transforming a vector v into the rotated vector qvq^{-1}=qvq^{*}, where we temporarily convert vectors in \mathbb{R}^3 to and from corresponding quaternions with zero real part, and the non-commutative quaternion multiplication is defined by

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1

Note that this form of conjugation, qvq^{-1}, is also a choice of convention. We could have used q^{-1}vq instead… but here again, most everyone seems to agree the above is the more sensible choice, since it keeps us consistent with the left group action notational convention, allowing us to comfortably switch back and forth between thinking in terms of rotation matrices or quaternions as our underlying implementation.

Here is the result:

import numpy as np

def quat_product(q1, q2):
    """Return quaternion product (q1 q2)."""
    w1, xyz1 = q1
    w2, xyz2 = q2
    return (w1 * w2 -, xyz2),
            w1 * xyz2 + w2 * xyz1 + np.cross(xyz1, xyz2))

class Rotation:
    def __init__(self, quat):
        """Create rotation from quaternion."""
        self.q = (quat[0], np.array(quat[1]))

    def __call__(self, v):
        """Rotate vector v."""
        w, xyz = quat_product(quat_product(self.q, (0, np.array(v))),
                              (self.q[0], -self.q[1]))
        return xyz

    def __matmul__(self, r2):
        """Compose rotations (this @ r2)."""
        return Rotation(quat_product(self.q, r2.q))

c = np.sqrt(0.5)
r1 = Rotation((c, (0, 0, c)))
r2 = Rotation((c, (0, c, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result

Note that I’ve pulled out a helper function, quat_product, that multiplies two quaternions, using the product as defined above and originally set down by Hamilton nearly two centuries ago.

But this is yet another technically arbitrary choice of convention. There is another definition of the quaternion product, advocated by Shuster [1], and in wide use by the spacecraft dynamics community, that changes the product relations from the “Hamilton convention” of

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1

to what Sommer et. al. [2] call the “Shuster convention” (sometimes referred to as the “JPL convention,” denoted here as the operator \otimes) of

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = -1, \mathbf{i}\otimes\mathbf{j}\otimes\mathbf{k} = +1

Mathematically speaking, there is nothing wrong with this. (Side puzzle: this doesn’t contradict the Frobenius theorem, right?) However– and finally we come to the point of this discussion– there is also nothing more inherently Right, or Natural, about it, either.

The effect, and indeed the motivation, of using Shuster’s convention is simply to reverse the order of the quaternion product. That is, given two quaternions q_1,q_2 with Hamilton product denoted by juxtaposition q_1 q_2, the reader can verify that their “Shuster product” q_1 \otimes q_2 = q_2 q_1.

In the quat_product implementation above, we could realize the Shuster product by flipping the plus sign on the cross product to a minus sign… and that’s it. Nothing else changes… except, possibly, how we might choose to interpret the parameterization of our rotations in the constructor.

Homomorphism vs. antihomomorphism

So, why mess with two centuries of notational inertia just to write products in reverse order? To see what I think is the root of the problem, note that I cheated somewhat in the quaternion implementation of the Rotation class above. I claimed not to change the interface at all, but I did: the original rotation matrix implementation’s constructor expected a matrix argument, while the quaternion version expects a quaternion argument.

One way to fix this would be to keep the original expectation of a matrix as the constructor argument, and convert it to the corresponding quaternion under the hood. Or conversely, in the matrix implementation, we would accept a quaternion as the constructor argument, and convert it to the corresponding matrix.

But what do we mean by conversion to the corresponding matrix (or quaternion)? We want the external interface to remain unchanged, so that we preserve behavior as observed by the user, no matter which internal implementation we use. In mathematical terms, we need a homomorphism: a conversion function \phi : SU(2) \to SO(3), with the property that for all unit quaternions q_1, q_2, converting them to rotation matrices preserves the behavior of composition, that is, \phi(q_1 q_2)=\phi(q_1)\phi(q_2).

So, consider the situation at this point. We have two choices of convention to make:

  1. What is the quaternion product: do we use Hamilton’s that has been around for nearly two centuries and is in common use in mathematical literature and software libraries, or Shuster’s that is mathematically equally workable but much more recent and much less widely known?
  2. What is the conversion function \phi described above?

The key observation is that, if we want our conversion function \phi to be a homomorphism, then fixing the choice of convention in either (1) or (2) determines the choice in the other. And this whole mess, that on the surface is Shuster’s “natural” quaternion product as choice (1), really stems from the prematurely fixed, equally arbitrary choice (2) for the correspondence between quaternions and matrices.

In other words, the complaint seems to be, “I’ve already fixed my assumed conversion function for some reason, and I want it to be a homomorphism, so I need the quaternion product definition to change accordingly.” But if we simply transpose Shuster’s assumed quaternion-to-matrix conversion– which he refers to in equation (4) of [1], but does not actually define until equation (61), without ever noting the implicit conventional choice in doing so– then the Hamilton product does indeed “satisfy… the natural order” of multiplication as desired.


Unfortunately, the damage is done. That is, rants like this aren’t going to fix all of the papers and lines of code that already exist that use both conventions, so we can do no better than double-check assumptions and conventions whenever we encounter a new paper, software library, etc.

There are even examples of more intermediate weirdness, so to speak, to be on the lookout for. For example, MATLAB’s Robotics Toolbox is reasonably well-behaved, consistently using the Hamilton product (robotics.utils.internal.quatMultiply) and the corresponding homomorphic conversion functions (quat2rotm and rotm2quat)… but in the Aerospace Toolbox, despite also using the Hamilton product (quatmultiply), the analogous conversion functions (quat2dcm and dcm2quat) are seemingly antihomomorphic– i.e., quat2dcm(q) returns the transpose of quat2rotm(q)— except for the added wrinkle that the provided rotation function quatrotate uses the conjugation q^{-1}vq consistent with the right group action convention (i.e., so that when composing rotations, matrix products look reversed compared to their corresponding quaternion products).


  1. Shuster, M., The nature of the quaternion, The Journal of the Astronautical Sciences, 56(3) 2008, 359–373
  2. Sommer, Gilitschenski, Bloesch, Weiss, Siegwart, Nieto, Why and How to Avoid the Flipped Quaternion Multiplication, arXiv:1801.07478
Posted in Uncategorized | 14 Comments

Analysis of Evil Hangman


This is a follow-up to the previous post that briefly mentioned the game of Hangman, in the context of developing a dictionary of words sorted based on their frequency of occurrence in the recently updated Google Books Ngram dataset.

My Python implementation of the game is on GitHub, where you are the guesser playing against the computer acting as the hangman. But this hangman is “evil:” that is, the computer doesn’t play “honestly,” selecting a secret word at the start of the game, and keeping responses to your guesses consistent with that initial selection throughout the game. Instead, the computer does not start with a secret word at all, merely a word length, and scores guesses using a greedy algorithm that maximizes the number of possible words in the dictionary that remain consistent with all scored guesses so far.

(Aside: this version is slightly more evil than the original, modified to break ties by minimizing the number of correct letters in each response, so that it will score a “miss” if possible, subject to the original constraint of maximizing the number of possible remaining words.)

It seems intuitive that Evil Hangman should be a much harder game to play than the normal honest version. But how much harder is it really? A recent paper by Barbay and Subercaseaux [1] shows that this simple children’s game hides a lot of complexity: determining a strategy that guarantees at least a specified number of missed guesses is coNP-hard, and the standard “homework assignment” greedy algorithm can be arbitrarily worse than optimal for appropriately crafted dictionaries… but what about “real” dictionaries? How much harder is Evil Hangman in practice with a typical English-language dictionary?

Modeling the guesser

To automate the analysis, we need to implement a model of a player’s sequence of guesses in each game. When I play Hangman, my initial guesses tend to be hard-coded, always starting with ETAOIN SHRDLU, until a first correct guess, only after which my strategy depends on the situation, the length of the word, which letters were correct and where, etc.

I do this knowing that it’s not optimal; there is a great Data Genetics blog post [2] that describes the problem with this strategy, and a better solution. Rather than prioritizing guesses based on frequency of occurrence in language (where even ETAOINSHRDLU isn’t quite right– a more accurate sequence is approximately ETAOINSRHLDCU), a better approach is instead to guess letters that occur in the largest number of remaining possible words in the dictionary.

The following table shows what this strategy looks like, indicating the sequence of guesses you should make for each possible word length… at least until you score the first correct guess, after which the subsequent strategy is too complex to fit in a simple table, since it depends on not only the word length, but which letter was correct and where, etc.

Word lengthGuesses until first correct
Table 1. Recommended guessing strategy leading to first correct guess, using NWL2018 Scrabble dictionary.

Note that this table differs slightly from the Data Genetics article, since the strategy depends on the dictionary. Here and throughout I am using the current North American (NWL2018) Scrabble dictionary of 192,111 words.


The following figure shows a comparison of performance of this “optimal” guessing strategy against various hangman strategies.

Figure 1. Number of misses (i.e., failed guesses) required vs. overall size of dictionary, for each possible word length. Left: typical “Honest” Hangman, where a word is selected uniformly at random from the dictionary. Middle: “Evil” Hangman. Right: Honest Hangman, but with the word selected to maximize number of misses against the implemented guesser.

There is a lot to unpack here. First, we consider a range of dictionary sizes on the x-axis, ranging from just the most commonly occurring words to the entire NWL2018 Scrabble dictionary. Second, each curve corresponds to a specific word length. This is perhaps the most interesting and unintuitive observation: the shortest words, with just 2-5 letters, are consistently the hardest ones to guess.

The leftmost plot shows the “normal” game, where an honest hangman selects a fixed secret word uniformly at random, with the y-axis indicating the expected number of misses (conditioned on that word length) before guessing the word.

The middle plot shows the performance against Evil Hangman. Interestingly, short words are not only harder to guess, they are much harder, with words of 3-5 letters requiring nearly twice as many misses as words with 6 letters. What are these “hardest” words? Recall that Evil Hangman does not select a specific word to guess; but when finally forced into a corner by this “optimal” guessing strategy, examples of the eventual lone remaining valid dictionary word are (from most to least common): hip, rib, tung, buzz, puff, jig, pugh, fil, jib, wuz, cuz, yuck, and guck.

However, as discussed in the paper motivating this post, this “greedily” Evil Hangman does not necessarily yield a maximum number of failed guesses. The rightmost plot shows the worst case performance against a normal honest hangman, i.e., where the secret word is selected to maximize the number of misses against this “optimal” guessing strategy. It is again perhaps unintuitive that merely crafting a particular selected word– but a fixed word– can yield so many more misses than the Evil Hangman, who actively avoids giving out information.

This behavior can be more clearly seen with a simpler example. Consider a dictionary with just five words: from, have, that, this, with. We can verify that our guessing strategy will win with zero misses against Evil Hangman, by choosing H as the first letter; but an honest hangman can guarantee that first guess will miss, by selecting the word from.

So what are these hardest words that an honest hangman could select from the actual Scrabble dictionary? Some of the more recognizable ones are junked (12 misses), jazzed and faffed (13 misses each)… and it only gets weirder from there; to require more than the 16 failed guesses squeezed out by the Evil Hangman, one could select words like: hin, rill, zin, zill, zax, yill, yins, zills, yills. I don’t recognize any of these, and I imagine that selecting one of them as a hangman secret word would probably start a fight.

Open problems

Of course, these examples highlight one of a couple of problems with this analysis: first, the strategy for both the guesser and the hangman, even in the evil case, are completely deterministic, and thus predictably repeatable. For example, having observed that the “optimal” guesser requires 12 misses to eventually guess the word pop, as hangman we could select pop again and again, and score 12 misses every single time.

Second, we aren’t even really playing the right game. In Hangman, the guesser does not necessarily continue until finally guessing the entire word. Instead, a maximum number of failed guesses is designated ahead of time, and the guesser loses as soon as that maximum is reached. Rather than minimizing the number of misses– which we haven’t even been guaranteeing anyway, hence my quotes around “optimal” strategy– the real objective is to maximize the probability of winning, i.e., of not reaching the maximum number of misses.

This is a very hard problem, with a mixed (that is, randomized) strategy Nash equilibrium that would be incredibly complex to even specify, let alone actually compute. Jon McLoone presents an interesting Wolfram blog post [3] that attempts to address these issues, albeit in a similarly simplified and not necessarily optimal way. (Many of the above words, including buzz and its inflected forms, jazzed, faffed, etc., also appear high on his list of “best” words for a hangman to choose.)


  1. Barbay, J. and Subercaseaux, B., The Computational Complexity of Evil Hangman, arXiv:2003.10000
  2. Berry, N., A Better Strategy for Hangman, Data Genetics (blog),
  3. McLoone, J., 25 Best Hangman Words, Wolfram News, Views, & Insights (blog),
Posted in Uncategorized | Leave a comment

An updated Google Books word frequency list


I think I nerd-sniped myself. This started with the objective of writing a simple program to play Hangman, as a demonstration of a potential programming exercise for students. (Don’t assign the problem if you don’t know the solution.) But in the process of creating the word list for the game, I found that last year Google released an updated export of its Google Books Ngrams data, which I’ve used and discussed here several times before. So I thought it would be interesting to revisit that data reduction exercise, and see what’s changed since the last release.

First, let’s get the Hangman game out of the way. The Python code is on GitHub; it’s only about 30 lines (plus a list of 10,000 words)… but I think it’s an interesting 30 lines. This could be a great problem for students, either to implement the game themselves, or even just to play this version, and inspect the code to figure out what’s “evil” about this particular computer player. I think it’s an interesting problem to implement even this simple approach efficiently, let alone to significantly improve on the computer player’s performance (see Reference 3 below).

The word list for the game is selected from a longer list– also on GitHub— of 458,343 words and their corresponding frequency of occurrence in the Google Books dataset. I built this longer list in two steps, as described below.

Google Books 1-grams

First, I downloaded the entire list of Google Books 1-grams (roughly, whitespace-delimited tokens) from the new English version 20200217 release, and aggregated the total number of occurrences of each 1-gram containing only the case-insensitive letters ‘A’ through ‘Z’, with no other special characters, but otherwise without any restrictions on word length or frequency of occurrence.

(Aside: This is the same filtering approach that I used for the previous 20120701 release, although the file organization and format changed in this new version. Peter Norvig did a similar analysis of the earlier data, but I was unable to reproduce his results; both I and at least one commenter on his site observed identical frequency counts that are, interestingly, almost-but-not-quite exactly half of his values.)

The result is 14,808,229 tokens and corresponding frequency counts. This is roughly triple the 4,999,714 tokens from the 2012 release, although it’s interesting that this new data set is not a proper superset of the old: there are 57,754 tokens missing in the new release, three of which are valid Collins Scrabble words (more on this later): alcaicerias (a Spanish bazaar), initiatrices (female initiators), and nouritures (nourishment).

More interesting are the new words that have been added in the last decade or so since the 2012 release. Scanning the 250 most frequently occurring new tokens yields a technological trip down memory lane: instagram, blockchain, bitcoin, hadoop, brexit, icloud, crowdfunding, pinterest, wikileaks, obamacare, gamification, hashtag, github, selfie, airbnb, kinect, tumblr, crispr, sexting, whatsapp, snapchat, spotify, microservices, cryptocurrency, tensorflow, emoji, cisgender.

An updated word frequency list

Armed with this list of nearly 15 million tokens and corresponding frequencies, the second step was to reduce the list of tokens to a more manageable “dictionary” of “words.” To do this, I used the union of the following five word lists:

  • The ENABLE2k word list, containing 173,528 words.
  • The North American Scrabble Players Association (NASPA) Word List 2018 (NWL2018), used in Scrabble tournaments in the United States and Canada, containing 192,111 words.
  • The Collins Scrabble Words 2019 (CSW19) list, used in Scrabble tournaments pretty much everywhere else, containing 279,496 words.
  • The Spell Checker Oriented Word List (SCOWL) by Kevin Atkinson, containing 430,590 words. (See the repository for details on the configurable parameters of this word list.)
  • [Edit 2021-09-11 to add] the Princeton WordNet Lexical Database for English version 3.1, containing 63,745 words constrained to all lowercase letters without any spaces or punctuation.

The SCOWL is included as a sort of intentional overkill, a compromise between the size of the dataset and the hope that it will contain as a subset whatever dictionary you might want to use for your application. Note that this comes at a cost of including tokens that are definitely not words in any reasonable dictionary; for example, all 26 single-letter tokens are present, not just the two words a and I.

The result is a single tab-separated text file with 458,343 rows, one for each word, and three columns: the word, followed by the number of occurrences in the 20120701 and 20200217 Google Books datasets, respectively. The entire list is sorted in decreasing order of frequency in the latest 20200217 dataset.


  1. Google Books Ngram Viewer Exports, English versions 20120701 and 20200217
  2. Atkinson, K., Spell Checker Oriented Word List (SCOWL) Copyright 2000-2018 by Kevin Atkinson
  3. Barbay, J. and Subercaseaux, B., The Computational Complexity of Evil Hangman, arXiv:2003.10000
  4. Princeton University “About WordNet.” WordNet. Princeton University. 2011.
Posted in Uncategorized | 3 Comments

Balanced clock hands

Given an analog clock with sweeping hour, minute, and second hands, at what times are the hands most evenly “balanced”– that is, most equally separated in angle?

It’s a standard high school algebra problem to compute all of the times at which just the hour and minute hands are aligned on top of each other. And it’s relatively straightforward to show that all three hands, including the second hand, are only exactly aligned twice per day, at 12 o’clock. But this problem is different: here, we would like the three angles made by the three hands to be as equal– that is, as close to 120 degrees each– as possible.

As we will see shortly, there is no time at which all three angles are exactly 120 degrees. (This is another nice exercise to prove just using pencil and paper.) So, how close can we get? This post was an admittedly somewhat frivolous excuse to experiment with SymPy, the Python symbolic mathematics library bundled with SciPy. My approach was pretty brute-force, letting SymPy do all of the heavy lifting with very few lines of code, including evaluating multiple different metrics defining what we mean by “balanced.” (All of the code discussed here is available on GitHub.)

The problem is complicated by the floor functions that would be buried in the cost function if we were to use a single parameter to represent the continuously varying time over the entire 12-hour domain. Instead, let’s divide the domain into 12×60=720 one-minute intervals, each specified by a fixed integer hour and minute, and for each, only let the second hand sweep through the single revolution of that one minute of time, computing the resulting three angles between consecutive hands:

def hand_positions(hour, minute, second):
    """Return positions in [0, 360) of clock hands."""
    r_60 = sym.Rational(60, 1)
    return (360 * (hour + minute / r_60 + second / (r_60 * r_60)) / 12,
            360 * (minute + second / r_60) / r_60,
            360 * second / r_60)

def hand_angles(hands):
    """Return angles between clock hands."""
    x, y, z = sorted(hands)
    return (y - x, z - y, x - z + 360)

At any given time, what yardstick should we use to measure how “balanced” the clock hands are? There are at least a couple of reasonable alternatives, even if we restrict our attention to (piecewise) linear cost functions. The first that occurred to me was to “maximize the minimum” angle: by the pigeonhole principle, at least one of the angles is at most 120 degrees, so let’s try to make that smallest angle as large as possible, measuring the deficit:

def max_min(*time):
    """(120 minus) minimum angle between clock hands."""
    return 120 - min(hand_angles(hand_positions(*time)))

Rather than maximizing the minimum angle, another option is to “minimize the maximum” absolute deviation of each angle from the desired optimal 120 degrees. This cost is implemented below. (I had initially also implemented the Manhattan distance, i.e., the sum (or equivalently, the average) of absolute deviations from 120 degrees. But it’s another nice problem to show that this is unnecessary: the sum of absolute deviations yields the same ordering as the maximum of absolute deviations… but this would not generally be true if our clocks somehow had more than just three hands (why?).)

def min_max(*time):
    """Max. deviation from 120 deg. of angles between clock hands."""
    return max(abs(a - 120) for a in hand_angles(hand_positions(*time)))

At this point, the key observation is that the only possible candidate times at which these cost functions are optimized are at their critical points (or at endpoints of the domain). And because these costs are piecewise linear, the critical points are easy to enumerate: either two of the angles are equal (when maximizing the minimum angle), or one of the angles is exactly 120 degrees (when minimizing the absolute deviation). The following function lumps all of these possibilities together into one list:

h, m, s = [sym.Symbol(v) for v in 'hms']

def critical_points():
    """Generate possible critical positions of sweeping second hand."""
    yield 0
    for x, y, z in permutations(hand_positions(h, m, s)):
        a, b, c = (y - x, z - y, x - z + 360)
        for lhs, rhs in ((a, b), (a, c), (b, c), (a, 120), (b, 120), (c, 120)):
            yield from sym.solve(lhs - rhs, [s])

The resulting “most balanced” times are shown below. There are a couple of interesting observations. First, a solution won’t be unique; times come in “mirror image” pairs with the same cost. Second, although the two cost functions considered here do yield slightly different optimal times, they differ by less than 3 hundredths of a second, and the same four or five best mirror pairs are all clustered within about 8 hundredths of a second of each other– all at approximately 2:54:34, along with its mirror time 9:05:25.

The two “mirror” times with the same minimum maximum absolute deviation of clock hand angles from 120 degrees: 2:54:34 394/719 (at left) and 9:05:25 325/719 (at right).

Finally, what if the second hand doesn’t sweep, but ticks from one integer second to the next? (When I was a kid, this was the magic method to distinguish between the real and fake Rolex watch, neither of which any of us had actually ever seen.) In this case, the most balanced times– that also appear within about a tenth of a second of the nearby overall top ten times– are at 5:49:09, along with its mirror time 6:10:51, as shown below.

The two “mirror” times that are most balanced if we constrain the second hand to “tick” at integer numbers of seconds: 5:49:09 (at left) and 6:10:51 (at right).
Posted in Uncategorized | Leave a comment