The original puzzle wasn’t exactly the one above; the edge labels were different, but the basic idea was the same. What snagged my interest here, decades later, was not *solving *this puzzle, but *counting *them. That is, if hexagonal pieces are distinguishable only by their pattern (up to rotation) of edge labels 0 through 5, then how many different possible puzzles– sets of seven such pieces packaged and sold as a product– are there?

I think this question is not “nice” mathematically– or at least, I was unable to make much progress toward a reasonably concise solution– but it was interesting computationally, because the numbers involved are small enough to be tractable, but large enough to require some thought in design and implementation of even a “brute force” approach.

(My Python solution is on GitHub. What I learned from this exercise: I had planned to implement a lazy k-way merge using the priority queue in the `heapq`

module, but I found that it was already built-in.)

There are several variants of the question that we can ask. First and easiest, let’s ignore solvability. There are 5!=120 different individual hexagonal pieces, and so there are , or 84,431,259,000 distinguishable sets of seven such pieces.

However, most of these puzzles do not have a solution. It turns out there are 4,967,864,520 different *solvable *puzzles… but there are at least a couple of ways that we might reasonably reduce this number further. For example, over a billion of these solvable puzzles have multiple solutions– 1800 of which have *twenty *different solutions each. If we constrain a “marketable” puzzle to have a *unique *solution, then there are… well, still 3,899,636,160 different possible puzzles.

Of course, many of these puzzles are only cosmetically different, so to speak. For example, the puzzle shown above has four identical pieces with the same 0-through-5 counterclockwise labeling. If we arbitrarily distinguish this “identity” piece, then although some puzzles have *none *of these pieces, they are not really “different” in a useful way, since we could simply relabel all of the edges appropriately so that they *do *contain at least one identity piece. There are only 281,528,111 different puzzles containing at least one identity piece, of which 221,013,350 have a unique solution.

This is an interesting problem, in part because it is easy to get wrong. The standard, all-the-cool-kids-know-it response is the Fisher-Yates shuffle, consisting of a sequence of carefully specified random transpositions, with the following basic implementation in Python:

def fisher_yates_shuffle(a): """Shuffle list a[0..n-1] of n elements.""" for i in range(len(a) - 1, 0, -1): # i from n-1 downto 1 j = random.randint(0, i) # inclusive a[i], a[j] = a[j], a[i]

Note that the loop index `i`

*decreases* from down to 1. Everywhere I have looked, this is how the algorithm is always presented. The motivation for this post is to wonder aloud why the following variant– which seems simpler, at least to me– is not the “standard” approach, with the only difference being that the loop runs “forward” instead of backward:

def forward_shuffle(a): "Shuffle list a[0..n-1] of n elements.""" for i in range(1, len(a)): # i from 1 to n-1 j = random.randint(0, i) # inclusive a[i], a[j] = a[j], a[i]

It’s worth emphasizing that this is different from what seems to be the usual “forward” version of the algorithm (e.g., this “equivalent version”), that seems to consistently insist on also “mirroring” the ranges of the random draws, so that *they *are decreasing with each loop iteration instead of the loop index:

def mirror_shuffle(a): "Shuffle list a[0..n-1] of n elements.""" for i in range(0, len(a) - 1): # i from 0 to n-2 j = random.randint(i, len(a) - 1) # inclusive a[i], a[j] = a[j], a[i]

There are a couple of ways to see and/or prove that `forward_shuffle`

does indeed yield a uniform distribution on all possible permutations. One is by induction– the rather nice loop invariant is that, after each iteration `i`

, the sublist `a[0..i]`

is a uniformly random permutation of the original sublist `a[0..i]`

. (Contrast this with the normal Fisher-Yates shuffle, where after each iteration indexed by `i`

, the “suffix” sublist `a[i..n-1]`

is essentially a uniformly permuted reservoir-sampled subset of the entire original list.)

Another way to see that `forward_shuffle`

works as desired is to relate its behavior to that of the original `fisher_yates_shuffle`

, which has already been proved correct. Consider the set of independent discrete random variables , with each distributed uniformly between 0 and , inclusive. These are the random draws returned from `random.randint(0, i)`

.

Imagine generating the entire set of those independent random draws up front, *then *applying the sequence of corresponding transpositions . The original Fisher-Yates shuffle applies those transpositions in order of *decreasing *, while `forward_shuffle`

applies the *same *set of random transpositions, but in reverse order. Thus, the permutations resulting from `fisher_yates_shuffle`

and `forward_shuffle`

are inverses of each other… and if a random permutation is uniformly distributed, then so is its inverse.

There is nothing special here– indeed, this `forward_shuffle`

is really just a less dressed-up implementation of what is usually referred to as the “inside-out” version of Fisher-Yates, that for some reason seems to be presented as only appropriate when shuffling a list generated from an external source (possibly of unknown length):

def forward_shuffle(source): "Return shuffled list from external source.""" a = [] for i, x in enumerate(source): a.append(x) j = random.randint(0, i) # inclusive a[i], a[j] = a[j], a[i] return a

I say “less dressed-up” because I’ve skipped what seems to be the usual special case `j==i`

comparison that would eliminate the swapping. The above seems simpler to me, and I would be curious to know if these (branchless) swaps are really less efficient in practice.

The following figure shows a spectrogram of the audio clip, with time on the *x*-axis, and each vertical slice showing the Fourier transform of a short (roughly 50 ms) sliding window of the signal centered at the corresponding time. We can clearly see the “dots” and “dashes” at around 1 kHz, with the corresponding translation overlaid in yellow.

Now that we have the Morse code extracted from the audio (which, for reference if you want to copy-paste and play with this problem, is “`.-..--...-.---...-..-...`

“), we just need to decode it, right? The problem is that the dots and dashes are all uniformly spaced, without the required longer gaps between *letters*, let alone the still longer gaps that would be expected between *words*. Without knowing the intended locations of those gaps, the code is ambiguous: for example, the first dot could indicate the letter E, or the first dot *and dash together* could indicate an A, etc.

That turns out to be a big problem. The following figure shows the decoding trie for Morse code letters and digits; starting at the root, move to the left child vertex for each dot, or to the right child vertex for each dash. A red vertex indicates either an invalid code or other punctuation.

If we ignore the digits in the lowest level of the trie, we see that not only are Morse code letters ambiguous (i.e., not prefix-free), they are nearly “*maximally *ambiguous,” in the sense that the trie of letters is nearly complete. That is, for almost any prefix of four dots and dashes we may encounter, the gap indicating the end of the first letter could be after *any *of those first four symbols.

This would make a nice programming exercise for students, to show that this particular sequence of 24 symbols may be decoded into a sequence of letters in exactly 3,457,592 possible ways. Granted, most of these decodings result in nonsense, like AEABKGEAEAEEE. But a more interesting and challenging problem is to efficiently search for *reasonable *decodings, that is, messages consisting of actual (English?) words, perhaps additionally constrained by grammatical connections *between *words.

Of course, it’s also possible– probable?– that this audio clip is simply made up, a random sequence of dots and dashes meant to *sound *like “real” Morse code. And even if it’s not, *we might not be able to tell the difference*. Which is the interesting question that motivated this post: if we generate a completely random, and thus *intentionally *unintelligible, sequence of 24 dots and dashes, what is the probability that it still yields a “reasonable” possible decoding, for sufficiently large values of “reasonable”?

Suppose that you are the owner of a new hotel chain, and that you want to implement a mechanical key card locking system on all of the hotel room doors. Each key card will have a unique pattern of holes in it, so that when a card is inserted into the corresponding room’s door lock, a system of LEDs and detectors inside the lock will only recognize that unique pattern of holes as an indication to unlock the door.

(I have vague childhood memories of family vacations and my parents letting me use just such an exotic gadget to unlock our hotel room door.)

When you meet with a lock manufacturer, he shows you some examples of his innovative square key card design, with the “feature” that a key card may be safely inserted into the slot in a door lock in any of its eight possible orientations: any of the four edges of the square key card may be inserted first, with either side of the key card facing up. Each key card has a pattern of up to 36 holes aligned with a 6×6 grid of sensors in the lock that may “scan” the key card in any orientation.

The lock manufacturer agrees to provide locks and corresponding key cards for each room, with the following requirements:

- A manufacturer-provided key card will only open its assigned manufacturer-provided lock and no other; and
- A manufacturer-provided key card will open its assigned manufacturer-provided lock when inserted into the slot in
*any*orientation.

How many distinct safely-locked rooms can the manufacturer support?

**A simpler lock is a harder problem**

The problem as stated above is a relatively straightforward application of Pólya counting, using the cycle index of the dihedral group of symmetries of the key card acting on (2-colorings of) the grid of possible holes in the card. When is even, the cycle index (recently worked out in a similar problem here) is

Evaluating at yields a total of 8,590,557,312 distinct key cards– and corresponding hotel room door locks– that the manufacturer can provide.

However, these locks are expensive: the second requirement above means that each lock must contain not only the sensing *hardware *to scan the pattern of holes in a key card, but also the *software *to compare that detected pattern against the eight possibly distinct rotations and reflections of the pattern that unlocks the door. (For example, the key card on the left in the figure above “looks the same” to the sensor in any orientation; the key card in the middle, however, may present any of four distinct patterns of scanned holes; and the key card on the right “looks different” in each of its eight possible rotated or flipped orientations.)

Which leads to the problem that motivated this post: to reduce cost, let’s modify the second requirement above– but still retaining the first requirement– so that a manufacturer-provided key card will only open its assigned manufacturer-provided lock when inserted into the slot in a *single *correct orientation labeled on the key card. This way, the sensing hardware in the lock only needs to “look for” a single pattern of holes.

Now how many distinct key cards and corresponding room locks are possible?

**Counting regular orbits**

The idea is that, referring again to the figure above, key cards may only have patterns of holes like the example on the far right, without any rotation or reflection symmetries. In other words, given the (dihedral) group of symmetries acting on colorings of the set of possible key card hole positions, we are counting only *regular* orbits of this action– i.e., those orbits whose colorings are “fully asymmetric,” having a trivial stabilizer.

So how can we do this? My approach was to use inclusion-exclusion, counting those colorings fixed by *none *of the symmetries in . To start, we represent each element of as a list of lists of elements of , corresponding to the disjoint cycles in the permutation of . For a given subset in the inclusion-exclusion summation, consider the equivalence relation on relating two key card hole positions if we can move one to the other by a sequence of symmetries in . Then the desired number of -colorings fixed by is , where is the number of equivalence classes.

We can compute this equivalence relation using union-find to incrementally “merge” the sets of disjoint cycles in each permutation in (all of the code discussed here is available on GitHub):

def merge(s, p): """Merge union-find s with permutation p (as cycles).""" def find(x): while s[x] != x: x = s[x] return x def union(x, y): x = find(x) s[find(y)] = x return x for cycle in p: reduce(union, cycle) for x in range(len(s)): s[x] = find(x) return s

It remains to compute the inclusion-exclusion alternating sum of these over all subsets .

def cycle_index_term(s, k=2): """Convert union-find s to cycle index monomial at x[i]=k.""" #return prod(x[i]**j for i, j in Counter(Counter(s).values()).items()) return k ** sum(Counter(Counter(s).values()).values()) def asymmetric_colorings(group, k=2): """Number of k-colorings with no symmetries in the given group.""" # Group G acts on (colorings of) X = {0, 1, 2, ..., n-1}. G = list(group) n = sum(len(cycle) for cycle in G[0]) # Compute inclusion-exclusion sum over subsets of G-e. G = [g for g in G if len(g) < n] return sum((-1) ** len(subset) * cycle_index_term(reduce(merge, subset, list(range(n))), k) for subset in chain.from_iterable(combinations(G, r) for r in range(len(G) + 1)))

Evaluating the result– and dividing by the size of each orbit — yields 8,589,313,152 possible “fully asymmetric” key cards satisfying our requirements.

**Questions**

At first glance, this seems like a nice solution, with a concise implementation, that doesn’t require much detailed knowledge about the structure of the symmetry group involved in the action… but we get a bit lucky here. The time to compute the inclusion-exclusion summation is exponential in the order of the group, which just happens to be small in this case.

For a more complex example, imagine coloring each face of a fair die red or blue; how many of these colorings are “orientable,” so that if the die rests on a table and we pick it up, put it in a cup and shake it and roll the die to a random unknown orientation, we can inspect the face colors to unambiguously determine the die’s original resting orientation? We can use the same code above to answer this question for a cube or tetrahedron (0 ways) or octahedron (120/24=5 ways)… but the dodecahedron and icosahedron are beyond reach, with rotational symmetry groups of order 60.

Of course, in those *particular *cases, we can lean on the additional knowledge about the structure of the subgroup inclusion partial order to solve the problem with fewer than the -ish operations required here. But is there a way to improve the efficiency of this algorithm in a way that is still generally applicable to arbitrary group actions?

A few days ago a friend of mine referred me to an interesting podcast discussing card shuffling, framed as a friendly argument-turned-wager between a couple about how many times you should shuffle a deck of cards. A woman claims that the “rule” is that you riffle shuffle *three* times, then quit messing around and get to dealing. Her partner, on the other hand, feels like at least “four or more” riffle shuffles are needed for the cards to be sufficiently random.

A mathematician is brought into the discussion, who mentions the popular result that *seven* shuffles are needed… at least according to specific, but perhaps not necessarily practical, mathematical criteria for “randomness.” (There is some interesting preamble about the need to define exactly what is meant by “random,” which I was disappointed to hear defined as, “any card is equally likely to be in any position in the deck.” This isn’t really even close to good enough. For example, start with a brand new deck of cards in a known order, and simply *cut* the deck at a uniformly random position. Now each and every card is equally likely to be in any position in the deck, but the resulting *arrangement* of cards can hardly be called sufficiently random.)

A win for the man, right? But the woman’s side is vindicated in the end, by noting that even in casinos– where presumably this has been given a lot of thought– a standard poker deck is typically only shuffled *three* times. Several dealers are interviewed, each describing the process with the chant, “riffle, riffle, box, riffle, cut.”

**The wash**

A couple of observations occurred to me after listening to this discussion. First, it’s true that casino dealers don’t shuffle seven times… but they also don’t *just* shuffle *three* times. Particularly when presented with a brand new pack, before any riffle shuffling, they often start with a “wash,” consisting of spreading the cards haphazardly around the table, eventually collecting them back into a squared-up deck to begin the riffle-and-cut sequence.

Depending on how thorough it is, that initial wash *alone* is arguably sufficient to randomize the deck. If we think of a single riffle shuffle as applying a random selection of one of “only” possible permutations in a *generating set*, then the wash is roughly akin to making a single initial selection from a generating set of *all* 52! possible arrangements. If the wash is thorough enough that this selection is approximately uniform, then after that, any additional shuffling, riffle or otherwise, is just gravy.

**When does it really matter?**

The second observation is one made by a dealer interviewed in the podcast, who asks what I think is the critical practical question:

The real question is, what’s the goal of the shuffle? Is it to completely randomize the cards, or is it to make it so that it’s a fair game?

In other words, if we are going to argue that three, or any other number of shuffles, is *not* sufficient, then the burden is on us to show that this limited number of shuffles provides a *practical advantage that we can actually exploit* in whatever game we happen to be playing.

We have discussed some examples of this here before. For example, this wonderful card trick due to Charles Jordan involves finding a spectator’s secretly selected card, despite being buried in a thrice-shuffled deck. And even *seven* shuffles is insufficient to eliminate a huge advantage in the so-called New Age Solitaire wager.

But it’s an interesting question to consider whether there are “real” card games– not magic tricks or contrived wagers– where advantage may be gained by too few shuffles.

I struggled to think of such a practical example, and the following is the best I can come up with: let’s play a simplified version of the card game War (also discussed here recently). Start with a “brand new” deck of cards in the following order:

Riffle shuffle the deck three times, and cut the deck. In fact, go ahead and cut the deck after *each* riffle shuffle. Then I will deal the cards into two equal piles of 26 cards, one for each of us. At each turn, we will simultaneously turn over the top card from our piles, and the higher card wins the “trick.” Let’s simplify the game by just playing through the deck one time, and instead of a “war” between cards of the same rank, let’s just discard the trick as a push. At the end of the game, whoever has taken the most tricks wins a dollar from the other player.

If three shuffles is really sufficient to make this a “fair” game, then the expected return for each player should be zero. Instead, I as the dealer will win over two out of three games, taking about 42 cents from you per game on average!

Of course, this is still contrived. Even the initial deck order above is cheating, since it isn’t the typical “new deck order” in most packs manufactured in the United States. And if we play the game repeatedly (with three shuffle-cuts in between), the advantage returns to near zero for reasonable methods of collecting the played cards back into the deck.

So, I wonder if there are better real, practical examples of this kind of exploitable advantage from too few shuffles? And can this advantage *persist* across multiple games, with the same too-few shuffles in between? It’s interesting to consider what types of games involve methods of collecting the played cards back into the deck to shuffle for the next round, that might retain some useful ordering; rummy-style games come to mind, for example, where we end up with “clumps” of cards of the same rank, or of consecutive ranks, etc.

If we strip off the complexities of the multiple players, limited number of re-rolls, and various other scoring combinations (e.g., straights, full houses, etc.), there is a nice mathematical puzzle buried underneath:

Roll dice each with sides, and repeatedly re-roll any subset of the dice– you can “keep” any or none of your previous rolls, and you can re-roll dice you have previously kept– until all dice show the same value (e.g., all 1s, or all 2s, etc.). Using an optimal strategy, what is the (minimum) expected number of rolls required? In particular, can we solve this problem for “Giant Yahztee,” where we are playing with, say, dice?

** Edit 2020-10-05**: Following are my notes on this problem. Given that we (re)roll of the dice– setting aside the remaining already identical dice– let the random variable indicate the resulting new number of identical dice. The distribution of is given by

so that the transition matrix for the absorbing Markov chain with state space indicating the current number of identical dice has entries

which we can use to compute the desired expected number of rolls. See the comments for a nice closed form solution for the cumulative distribution function for the number of rolls when .

]]>The MATLAB colon operator is surprisingly complicated, given that its job *seems* pretty simple to describe: generate a vector of regularly-spaced values, with a specified starting point, step size, and endpoint. For example, to create the vector :

x = 0:0.1:1.2;

At least some complexity is understandable, since as in this example, the “intended” step size and/or the endpoints may not be represented exactly in double floating-point precision. But in MATLAB’s usual habit of trying to “helpfully” account for this, things get messier than they need to be. The motivation for this post is to describe two *different* behaviors of the colon operator: it behaves in one special way in for loops, and in a different way– well, everywhere else.

**Creating vectors with colon syntax**

First, the “everywhere else” case: as the documentation suggests,

The vector elements are roughly equal to

`[start, start+step, start+2*step, ...]`

… however, if [the step size] is not an integer, then floating point arithmetic plays a role in determining whether`colon`

includes the endpoint in the vector.

That is, continuing the above example, note that `ismember(1.2, x)`

, despite the fact that `0+12*0.1 > 1.2`

. But the actual implementation is even more complex than just computing the “intended” endpoint. The output vector is effectively constructed in two halves, *adding* multiples of the step size to the starting point in the first half, and *subtracting* multiples of the step size from the (computed) *endpoint* in the second half.

So far, this seems reasonably well known, despite the broad strokes documentation. There is a good description of the details of how this works on Stack Overflow. Let’s not worry about those details here, though; instead, what seems less well known is that the *same* colon expression, such as in the example above, behaves *differently* when it appears in a for loop.

**For loops with (and without) colon syntax**

First, it’s worth noting that MATLAB for loops don’t *have* to use the colon operator at all. With not-quite-full-fledged iterator-ish semantics, you can iterate over the columns of an arbitrary array expression. For some examples:

for index = [2, 3, 5, 7] disp(index); % 4 iterations end for index = x disp(index); % 13 iterations end

(Technically, iteration is over first-column “slices” of the possibly multi-dimensional array. This can cause some non-intuitive behavior. For example, how many iterations would you expect over `ones(2,0,3)`

? What about `ones(0,2,3)`

?)

But here is where things get weird. Consider the following example:

x = 0:0.1:1.2; for index = 0:0.1:1.2 disp(find(x == index)); end

This loop only “finds” 7 of the 13 elements of the original vector above, which was created using exactly the same colon operator expression!

So what’s going on? First, while the colon operator documentation was perhaps merely incomplete, the for loop documentation is downright misleading, suggesting that the behavior is to “increment `index`

by the `step`

on each iteration.” That sounds to me like repeatedly adding the step size to the value at the *previous* iteration, which would be even worse in terms of error accumulation, and is fortunately not what’s happening here.

Instead, experiments suggest that what *is* happening is essentially the overly-simplified description in the *colon operator* (*not* the for loop) documentation: the statement `for index = start:step:stop`

iterates over values of the form `start+k*step`

— i.e., adding *multiples* of the step size to the *starting* point– with the added detail that the number of iterations (i.e., the stopping point) seems to be computed in the same way as the “normal” colon operator. That is, the documentation is also wrong in that it’s not as simple as incrementing “until `index`

is greater than `stop`

” (witness the example above, where the last value is allowed to slightly overshoot the given endpoint). I have been unable to find an example of a colon expression whose *size* is different depending on whether it’s in a for loop.

**Conclusion**

What I find most interesting about this is how *hard* MathWorks has to work– *and is still working*— to make this confusing. That is, the colon syntax in a for statement is a special case in the parser: there are necessarily extra lines of code to (1) detect the colon syntax in a for loop, and (2) do something different than they could have done by simply always evaluating whatever *arbitrary* array expression– colon or otherwise– is given to the right of the equals sign.

And this isn’t just old legacy behavior that no one is paying attention to anymore. Prior to R2019b, you could “trick” the parser into skipping the special case behavior in a for loop by wrapping the colon expression in redundant array brackets:

for index = [0:0.1:1.2] disp(find(x == index)); % finds all 13 values end

However, as of R2019b, this no longer “works;” short of using the explicit function notation `colon(0,0.1,1.2)`

, it now takes more sophisticated obfuscation on the order of `[0:0.1:1.2, []]`

or similar nonsense to say, “No, really, use the colon version, not the for loop version.”

Given two vectors in three dimensions, what is the most accurate way to compute the angle between them? I have seen several different approaches to this problem recently in the wild, and although I knew some of them had potential issues, I wasn’t sure just how bad things might get in practice, nor which alternative was best as a replacement.

To make the setup more precise, let’s assume that we are given two non-zero input vectors , represented *exactly* by their double-precision coordinates, and we desire a function that returns a double-precision value that most closely approximates the angle between the vectors, with all intermediate computation also done in double precision.

**Kahan’s Mangled Angles**

William Kahan discusses three formulas in the “Mangled Angles” section of the paper linked below. The first is the “usual” dot product formula:

with the following C++ implementation, which as Kahan points out requires clamping the double-precision dot product to the interval to avoid a NaN result for some vectors that are nearly parallel:

double angle(const Vector& u, const Vector& v) { return std::acos(std::min(1.0, std::max(-1.0, dot(u, v) / (norm(u) * norm(v))))); }

Kahan subsequently describes another formula using the cross product:

with the following implementation:

double angle(const Vector& u, const Vector& v) { double angle = std::asin(std::min(1.0, norm(cross(u, v)) / (norm(u) * norm(v)))); if (dot(u, v) < 0) { angle = 3.141592653589793 - angle; } return angle; }

Interestingly, Kahan does *not* mention that this formula *also* requires clamping the `asin`

argument to the interval ; following is an explicit example of inputs demonstrating the potential problem:

Finally, despite referring to the above formula as “the best known in three dimensions,” Kahan finishes with the following “better formula less well known than it deserves”:

with the following implementation:

double angle(const Vector& u, const Vector& v) { double nu = norm(u); double nv = norm(v); return 2 * std::atan2(norm(nv * u - nu * v), norm(nv * u + nu * v)); }

That’s a lot of square roots. I didn’t focus on performance here, but it would be an interesting follow-on analysis to compare the speed of each of these formulas.

Other approaches are possible; following is the formula that I *thought* was the most accurate, before reading Kahan’s paper:

double angle(const Vector& u, const Vector& v) { return std::atan2(norm(cross(u, v)), dot(u, v)); }

This has the added benefit of involving just a single square root. This is the formula that I used to compute the “true” angle between vectors to compare errors, using arbitrary-precision rational arithmetic to compute the square root (actually, a reciprocal square root, which is slightly easier) and arctangent. All of the source code is available on GitHub.

And finally, the approach that I saw most recently that motivated this post, using the Law of cosines:

double angle(const Vector& u, const Vector& v) { double u2 = u.x * u.x + u.y * u.y + u.z * u.z; double v2 = v.x * v.x + v.y * v.y + v.z * v.z; Vector d = u - v; return std::acos(std::min(1.0, std::max(-1.0, (u2 + v2 - (d.x * d.x + d.y * d.y + d.z * d.z)) / (2 * std::sqrt(u2) * std::sqrt(v2))))); }

**Results**

The relative accuracy of each formula depends on the magnitude of the angle between the input vectors. The following figure shows this comparison for angles near 0 (i.e., nearly parallel vectors), near , near (i.e., nearly orthogonal vectors), and near (i.e., nearly “anti-parallel” vectors).

The *x*-axis indicates an offset from the “true” angle between the input vectors, computed to roughly 200-bit accuracy. The *y*-axis indicates the error in the double-precision output, compared against the true angle *also rounded to double-precision*. The points hugging the bottom of each figure are my poor man’s attempt at indicating *zero* error (note that these are on a log-log scale), i.e., the 64-bit double-precision output matched the corresponding value rounded from the 200-bit true angle. (In many ways this figure feels like a failure of visual display of quantitative information, and I’m not sure how best to improve it.)

So what’s the takeaway? If you don’t care about absolute errors smaller than a few dozen nanoradians, then it doesn’t really matter which formula you use. And if you *do* care about errors– and angles– smaller than that, then be sure that your *inputs* are accurate in the first place. For example, did you normalize your “real” input vectors to unit length first, and if so, how much error did you unintentionally incur as a result? We can construct very small angles between vectors if we restrict to “nice” two-dimensional inputs like and . But it’s an interesting exercise to see how difficult it is to construct vectors “in general position” (e.g., randomly rotated) with a prescribed small angle between them.

As expected, the two arccosine formulas behave poorly for nearly parallel/anti-parallel vectors, and as Kahan describes, the arcsine formula behaves poorly for nearly orthogonal vectors. The two arctangent formulas are the most consistently accurate, and when the one mentioned by Kahan *is* better, it’s typically *much* better.

**Reference:**

- Kahan, W., How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? [PDF]

I am not interested in arguing about government policies, or even epidemiological models here. Frankly, this video is too easy a target. The error made in this video is a mathematical one– an error so simple, and yet so critical to the presenter’s argument, that it’s not worth bothering with the remainder of the presentation. Instead, I’d like to use this video as an excuse to rant about mathematical notation.

The problem starts at about 3:38 in the video, where the presenter attempts to analyze the COVID-19 outbreak on the aircraft carrier USS *Theodore Roosevelt* as a realization of the so-called “final size equation,” a model of the end-game, steady state extent of an epidemic in a closed system (since the sailors were isolated onboard the ship for a significant period of time). The final size equation is

where is the “final size” of the pandemic, or the fraction of the population that is eventually infected, and is the basic reproduction number, essentially the average number of *additional* people infected through contact with a person already infected, in the situation where everyone in the population is initially susceptible to infection.

As the presenter explains, there is a critical difference between a reproduction number *less* than one, resulting in “extinction” of the disease, and a value *greater* than one, resulting in an epidemic. Using the fact that 856 of the 4954 sailors onboard the *Roosevelt* eventually tested positive for COVID-19, corresponding to , we can estimate by solving for it in the final size equation, yielding

It’s a simple exercise to verify that the resulting estimate of is about 1.1. It’s also a relatively simple exercise to verify that this estimation technique *cannot possibly* yield an estimate of that is *less* than one.

Despite this, the presenter manages– conveniently for her argument that the contagiousness of the virus is overblown– to compute a value of of about 0.48… by computing the base 10 logarithm instead of the natural logarithm in the formula above.

It’s interesting to try to guess how the presenter managed to make this mistake. My guess is that she did this in an Excel spreadsheet; that is the only environment I know of where `log(x)`

computes the base 10 logarithm. In any other programming environment I can think of, `log(x)`

is the natural logarithm, and you have to work at it, so to speak, via `log10(x)`

, or `log(x)/log(10)`

, to compute the base 10 logarithm.

The mathematical notation situation is a bit of a mess as well. Sometimes I’m a mathematician, where means the natural logarithm, and any other base is usually specified explicitly as . But sometimes I am an engineer, where usually means base 10, but sometimes in a communications context it might mean base 2. Other times I am a computer scientist, where is a common shorthand for base 2, and can mean pretty much anything, including “I don’t care about the base.”

]]>Jacob Brazeal describes the following interesting puzzle in a recent MAA article (see reference below): starting with four rooks in the four corner squares of a chessboard, as shown in the figure below, move the rooks into the four *center* squares… where each single move is constrained to sliding a single rook, either horizontally along its rank or vertically along its file, *as far as possible*, “blocked” only by another rook or the edge of the board.

Note that going in the other direction is easy– we can move the rooks from the center out to the corners in just 8 moves. But this problem is harder; it’s a nice programming exercise to determine the minimum number of moves required. The motivation for this post is to describe a slightly different approach to the problem than presented in the article, as well as a variant of the problem using queens instead of rooks that also has some interesting mathematical structure.

All of the code is available on GitHub.

**Breadth-first search**

We can view this problem as a directed graph, with a vertex for each possible state of the board, and a directed edge if we can move a single rook in state to obtain state . The goal is to find a minimum-length path from the starting vertex with the rooks at the corners to the goal vertex with the rooks in the center of the board.

It’s an interesting question whether there is a convenient admissible heuristic estimate of the number of moves required from a given board state, that would allow a more efficient informed search. I couldn’t come up with one; fortunately, simple breadth-first search turns out to be acceptably efficient for this problem:

from collections import deque def bfs(neighbors, root): """Breadth-first search. Given a graph neighbors:V->V* and a root vertex, returns (p, d), where p[v] is the predecessor of v on the path from the root, and d[v] is the distance to v from the root. """ queue = deque([root]) parent = {root: None} distance = {root: 0} while queue: vertex = queue.popleft() for neighbor in neighbors(vertex): if neighbor not in parent: parent[neighbor] = vertex distance[neighbor] = distance[vertex] + 1 queue.append(neighbor) return (parent, distance)

It turns out that a minimum of 25 moves are required to solve the puzzle. That’s a lot– too many, really, so that this would probably not be very fun to explore by hand with an actual chess board (more on this shortly). And there are other configurations that are even more difficult to reach. The board that is “farthest” from the initial rooks-in-the-corners state is shown below, requiring 32 moves to reach:

**Symmetry group action**

How large is the directed graph that we need to explore? The referenced article describes a graph with =635,376 vertices, one for each possible subset of four squares in which to place the rooks. This graph has some interesting structure, with one really large strongly connected component explored by the above search algorithm, containing 218,412– over one-third– of all possible board states. The remainder is made up of a large number of *much* smaller unreachable components: the next largest component contains just 278 vertices!

However, these numbers count configurations of rooks that are not *usefully* distinct. For example, the figure above shows just one of eight “different” vertices, all of which require 32 moves to reach from the initial vertex… but the other seven board states are merely rotations and/or mirror reflections of the board shown in the figure, and thus are reachable by correspondingly rotated and/or reflected versions of the same sequence of 32 moves.

In other words, let’s consider the dihedral group of symmetries of the board acting on the set of possible board states, and construct the (smaller) directed graph with a vertex for each orbit of that group action.

A standard trick for implementing this approach is to represent each orbit by one of its elements, chosen in some natural and consistent way; and a standard trick for making that choice is to impose some convenient total order on the set, and choose the least element of each orbit as its representative. In the case of this problem, as we encounter each board state `v`

during the search, we “rename” it as `min(orbit(v))`

, the lexicographically least tuple of rotated and/or reflected coordinates of the rook positions:

def orbit(pieces): """Orbit of dihedral group action on rooks on a chess board.""" for k in range(4): yield pieces yield tuple(sorted((n - 1 - x, y) for (x, y) in pieces)) # reflect pieces = tuple(sorted((n - 1 - y, x) for (x, y) in pieces)) # rotate

This search space is almost– but not quite– eight times smaller. From the initial rooks-in-the-corners board state, we can reach 27,467 configurations unique up to rotations and reflections, out of a total of 79,920 possible configurations. We can compute the latter number without actually enumerating all possible board states: the cycle index of the dihedral group acting on the squares of an board (assuming is even) is

and the number of possible board states with rooks is

**Sliding queens instead of rooks**

Finally, I think perhaps a more “fun” variant of this problem is to consider four *queens* in the corners, and try to move them to the four center squares as before, using the same “maximal” moves, but allowing diagonal moves as well as horizontal and vertical. This is more tractable to solve by hand, requiring only 12 moves to complete.

And the structure of the corresponding graph is also rather interesting: the large connected component is even larger, so that we can now reach 77,766 of the 79,920 possible configurations of four queens… but the remaining 2,154 configurations are all *singleton* components! That is, from any one of these 2,154 “lone” configurations, we can move *into* the large component with just a single move, and from there reach any of those 77,766 configurations… but we can’t get *back*, nor can we reach any of the *other* 2,153 lone unreachable configurations!

This was interesting enough that I wondered if it was true in general for other board sizes. It’s trivially true for 2×2 and 4×4 (since there are no unreachable board states), as well as 6×6, 8×8, and even 10×10… but unfortunately the pattern does not continue; the 12×12 board has larger-than-singleton connected components not reachable from the initial queens-in-the-corners state.

**Reference:**

- Brazeal, J., Slides on a Chessboard,
*Math Horizons*,**27**(4) April 2020, p. 24-27 [link]