Rube Goldberg Monte Carlo

Introduction

“This shouldn’t take long,” Alice says. “It’s only flipping a coin, after all.”

Alice and Bob are talking in the hallway at work. Alice has been tasked with analyzing the behavior of a simple black box. When a button on the side of the box is pressed, a mechanism inside the box flips a biased coin, and displays either a green light when the coin comes up heads– with some fixed but unknown probability p— or a red light when it comes up tails.

“As we discussed in the meeting, I will press the button to flip the coin 200 times, observe the number of green and red light indications, and report the resulting 95% confidence interval bounds for p— to two digits, as usual*.”

“That’s great for you,” Bob grumbles. “I’m the one stuck with analyzing the Rube Goldberg Variation.”

Bob’s task is more involved. The Rube Goldberg Variation is a much larger black box, taking up almost an entire room. Instead of just a button on the outside of the box, there is a dial as well, that may be set to a positive integer n specifying the “mode” in which the box is configured.

The mechanism inside is much more complex as well: pressing the button on the side of the box starts a convoluted sequence of interacting mechanisms inside, many of which exhibit some randomness similar to Alice’s coin flip. This process takes several minutes… but eventually, the box displays either a green light indicating “success” — with some fixed but unknown probability q(n) that depends on the selected mode– or a red light indicating “failure.”

“I suppose the good news is that at least we’re only focusing on the one mode n=10,” Bob says. “But this analysis is still going to take days to complete. Not only does each single experiment– from pressing the button to observing the light– take longer, but I will also need more sample outcomes of the experiment to estimate q(10) with the same degree of certainty.”

Alice looks confused. “Wait, why do you need more samples? Shouldn’t the usual 200 be enough?”

Bob replies, “If it were just a coin flip, sure. But the Rube Goldberg Variation has a lot more moving parts, with a lot more opportunities to introduce randomness into the process. I need more samples to cover all of the possible combinations of corner-case interactions between all of those components.”

Monte Carlo simulation

This post is motivated by Bob. Recently, Bob was an undergraduate student… but I’ve also encountered discussions like the one above when Bob has been a computer scientist with a Ph.D. So, this is my attempt to capture my thoughts on this subject, to hopefully highlight the specific areas where there are potential pitfalls and opportunities for confusion.

In these past discussions, there were not any actual black boxes. Instead, there were computer simulations of the processes inside hypothetical black boxes. For example, the simulation of Alice’s black box might look like this (in Python):

def alice():
    return 1 if random.random() < 0.75 else 0

where we can estimate the probability p of success by evaluating alice() many times and averaging the results. Of course, that would be a bit ridiculous if we were allowed to inspect the source code, since we could just read directly that p=0.75.

But now consider Bob’s more complicated simulation:

def bob(n):
    position = 1
    while True:
        while True:
            move = random.randint(-1, 1)
            if 1 <= position + move <= n:
                break
        position = position + move
        if move == 0:
            break
    return 1 if position == 1 else 0

(In reality, Bob’s simulation should be much more complicated than this, with so many lines of code, and so many random components, that analysis by inspection is practically impossible. But I can’t resist inserting a tractable puzzle even when it’s not the point: inside this black box, there are n=10 slots arranged in a row, with a marble initially in the left-most slot. When Bob presses the button, the marble makes a series of random moves, with each move selected uniformly from {left, stop, right}. An attempt to move left beyond the left-most slot, or right beyond the right-most slot, is ignored. The marble moves randomly back and forth along the row of slots, until a “stop” move, resulting in display of a “successful” green light if the marble stops in its original left-most slot, or a “failure” red light otherwise.)

It’s at least not as immediately clear from inspection alone what is the probability q(n) that bob(n) returns 1… indeed, with those nested while True loops in there, it might not be obvious whether this function is guaranteed to return at all (fortunately, it is– more on this later).

So, to level the playing field, so to speak, let’s suppose that neither Alice nor Bob is able to inspect their simulation source code at all. In other words, neither is allowed to open their black box and peek inside; all either can do is inspect their black box’s output, observed from repeatedly pressing the button on the side of the box. (Remember, Bob was the one complaining; so we’re trying to remove any unfair advantage that Alice might have.)

Given this setup, we are considering the following question: because Bob’s simulation is more complex, does he require more sample executions of his simulation to estimate q(10), than Alice requires to estimate p with the same degree of certainty?

A simulation is a function

I think it’s helpful to think of a simulation like those above as a function, in the mathematical sense; that is, a simulation is a function

f:X_1 \times X_2 \times \cdots \times X_m \times U^{\omega} \to \{0,1\}, U=[0,1)

that maps its input to a binary output y \in \{0,1\}, that is,

y = f(x_1,x_2,\ldots,x_m,(u_1,u_2,\ldots))

where the output y=1 means “success” and y=0 means “failure.”

The input parameters x_1,x_2,\ldots,x_m specify the configurable “mode” of the simulation. In Alice’s case, m=0 since her simulation isn’t configurable. In Bob’s case, m=1 and X_1=\mathbb{Z}^+, where he is interested in the specific case x_1=10.

The inputs u_1,u_2,\ldots specify a single realized sample of a pseudo-random number stream, where each u_k is drawn independently and uniformly from the unit half-open interval U. This is enough expressive power to represent effects of the random processes in any computable simulation: in the context of the Python implementations above, each u_k is the result of the k-th evaluation of random.random(), which we can think of as the “subroutine” in terms of which all other pseudo-random functions, like random.randint(a, b), or random.gauss(mu, sigma), etc., may be implemented.

(Pseudo) randomness

I think viewing the “random” behavior of a computer simulation in this mathematical way is useful, since it forces us to consider some important implementation details. Note that by specifying our simulation as a function, we require that the function be well-defined. That is, every possible input must map to a unique output.

Input must map to a unique output: in other words, our simulation must be repeatable. Given the same input values– specifying both the configuration parameters x_1,x_2,\ldots,x_m as well as the sampled random number stream u_1,u_2,\ldots— the simulation must always return the same output value, either always 0 or always 1. Output cannot vary depending on “hidden” additional factors outside the domain of the function, such as the system clock’s indication of the current time, etc.

Every input must map to an output: here we need to be careful about guaranteeing that our simulation will terminate. That is, in any particular practical execution of the simulations above, we never need the entire sequence of “available” random numbers (since otherwise it would run forever). For example, Alice’s simulation only ever depends on the first random draw u_1. Bob’s simulation, on the other hand, depends on an unbounded but finite number of draws that varies from one run to the next. That is, there is a positive probability that running the simulation could take an hour, or a week, or a year, or any specified amount of time, no matter how long… but there is zero probability that it will run forever.

But there are input sequences in the domain of the simulation function that correspond to this “run forever” behavior. For example, in Bob’s simulation, what if the marble selects the “left” move at every turn? Or alternates “left” and “right,” without ever selecting “stop”? For our function to be well-defined, we still need to assign some output to any and all of these infinite-length inputs in the domain, even if they have zero probability of occurring.

The good news is that we can assign any output we want… because it doesn’t matter! That is, no matter what questions we might ask about the statistical behavior of the output of a (terminating) simulation, the answers do not depend on our choice of assignment of outputs to those “impossible” outcomes.

More precisely, we can change our perspective slightly, and view our simulation function as a random variable Y on the sample space U^{\omega} with its natural probability measure, parameterized by the simulation mode:

Y_{x_1,x_2,\ldots,x_m}:U^{\omega} \to \{0,1\}

where we don’t care about the output from those “impossible” outcomes because they have measure zero. Note that this is essentially just a notational change, not a conceptual one: a random variable is neither random, nor a variable; it’s simply a function, just like we started with. This random variable has a Bernoulli distribution, whose parameter P(Y=1) we are trying to estimate… which we can do, no matter the “complexity” of the simulation realizing the function Y.

Conclusion

So, the answer to the question posed here– does Bob need more random samples than Alice to do his job– is no. The black boxes are black; we can’t peek inside them. They are effectively indistinguishable by anything other than their behavior. Similarly, a function implementing a simulation is characterized not by its “complexity” of “implementation,” but solely by its mapping of inputs to outputs.

I’m pretty sure I’ve already beaten this horse to death. But there are several other interesting wrinkles not discussed here. For example, we often want to actually no-kidding repeat exactly the same past simulation behavior (when debugging, for example); this capability is managed by using and recording seeds and/or substream indices as “shorthand” for the infinite sequences u_1,u_2,\ldots of input random numbers. Also, the input configuration parameters x_1,x_2,\ldots,x_m have so far been mostly just additional notational noise. But that configurability can lead to additional analysis pitfalls, when combining Monte Carlo observations of simulation runs across different values of those configuration parameters.

Notes

(*) When reporting on results of the analysis, why the frequentist Clopper-Pearson confidence interval instead of a more fashionable Bayesian approach of reporting, say, the (narrowest) highest density credible interval with, say, a uniform prior? Since those decisions weren’t really the point here, I chose the experimental setup carefully, so that it doesn’t matter: the resulting intervals, reported to two digits of precision, will be the same either way. If there is any safe place to be a frequentist, it’s on a computing cluster.

Posted in Uncategorized | 3 Comments

NCAA tournaments are getting measurably wilder

Last night, the #1 seed Kansas Jayhawks won the NCAA men’s basketball tournament. Maybe an unsurprising outcome… but they defeated #8 seed North Carolina in the championship game. Only four #8 seeds have made it that far in the nearly four decades of the current format.

This year’s tournament also saw #15 seed Saint Peter’s advance to their regional final, the first #15 seed ever to do so.

And there were plenty of other unlikely upsets as well. Considering the tournament as a whole, that is, all 63 games (ignoring the abomination of the four play-in games) and their seeding match-ups, how “wild” was this year’s outcome compared to past tournaments?

Or, put another way, what was the prior probability of picking a “perfect bracket,” guessing every game outcome correctly, and how does that probability compare with past years?

This question has been discussed here before; I won’t rehash the details of this past article that describes the details of methodology for modeling game probabilities. This post is really just bringing that analysis up to date to reflect recent tournaments (all source historical tournament data is available on GitHub). Here are the results:

Probability of a perfect bracket, 1985-2022.

The constant black line at the bottom reflects the 1 in 2^{63}, or 1 in 9.2 quintillion probability of guessing any year’s outcome correctly, if you simply flip a fair coin to determine the winner of each game. The constant blue and red lines at the top are for comparison with the actual outcomes, indicating the probability of a “chalk” bracket outcome, always picking the higher seeded team to win each match-up. (The blue and red colors reflect two different models of game probabilities as a function of difference in “strength” of each team; as discussed in the previous article, blue indicates a strength following a normal distribution density, and red indicates a linear strength function.)

By either the normal or linear models, three of the last four tournaments (2018, 2021, and 2022– 2019 was pretty by the book, and 2020 was, well, 2020) were among the top five “wildest,” most unlikely outcomes in the last four decades.

Posted in Uncategorized | 4 Comments

Bresenham’s circles are (always?) optimal

Introduction

In my recent experiment with string art, I needed Bresenham’s line algorithm to rasterize a line, that is, to compute the set of pixels that most closely approximate a straight line segment. Suppose instead that we want to rasterize a circle, as shown in the figure below.

Bresenham’s algorithm used to rasterize a circle with radius 10. Integer lattice points are assumed to be at the center of each pixel.

There is a similarly named Bresenham’s circle algorithm to do this, that is arguably even simpler than that for drawing a line (code is available on GitHub):

// Call plot(_, _) to draw pixels on circle at origin with given radius.
void draw_circle(int radius, void(*plot)(int, int))
{
    int x = radius;
    int y = 0;
    int error = 3 - 2 * radius;
    while (x >= y)
    {
        plot(x, y); plot(x, -y); plot(-x, y); plot(-x, -y);
        plot(y, x); plot(y, -x); plot(-y, x); plot(-y, -x);
        if (error > 0)
        {
            error -= 4 * (--x);
        }
        error += 4 * (++y) + 2;
    }
}

I have a fond memory of my first encounter with this algorithm as a kid reading Nibble Magazine (“Seven ready-to-type Apple II programs!”). In his article, Brent Iverson [1] had “discovered a neat little circle-generating algorithm in a graphics textbook” … but neglected to mention which textbook, or the name Bresenham, or any other details about how this mysterious snippet of code (which was in BASIC, and in slightly different but equivalent form) worked, despite using only integer addition and bit shift operations. In 1988, without search engines or internet forums, I was on my own.

Today, there are derivations of this algorithm all over the web. However, every one that I have found glosses over what I think is a beautiful property of the algorithm: its fast integer error function is only an approximation of the actual error function, which seems like it could sacrifice some accuracy in choosing which pixels best approximate the desired circle. But it doesn’t; the motivation for this post is to show that (1) this is an issue worth thinking about, since the algorithm’s error function is not monotonic in actual distance from the circle, but (2) despite this, in practice Bresenham’s circle algorithm is still optimal in the sense of always selecting the next pixel that is indeed closest to the circle.

At least, I think. More on this later.

The algorithm

Let’s start by describing how the algorithm works, as a means of introducing some useful notation. Fixing a positive integer input radius r (the reader can verify that the algorithm also works for r=0), we can focus on rendering only those pixels in the first octant 0 \leq y \leq x, using the eightfold dihedral symmetries of the circle to plot the remaining pixels.

Starting at (x,y)=(r,0), at each iteration we always move up one pixel, and conditionally move (diagonally) left one pixel, choosing from the two corresponding lattice points (x-1,y+1) or (x,y+1) depending on which is closest to the circle. (We always move up, since in the first octant the slope of the tangent to the circle is always at most -1.)

An iteration of Bresenham’s circle algorithm. We have just plotted the pixel at (x,y), and need to choose which of (x-1,y+1) or (x,y+1) to plot next, ideally minimizing the actual distance from the circle (shown in red).

Ideally, we would choose the next pixel that minimizes the actual, exact distance from the circle |c(h,k)|, where

c(h,k)=\sqrt{h^2+k^2}-r

as indicated by the red line segments in the above figure. However, to eliminate those unpleasant square roots, Bresenham’s algorithm instead seeks to minimize |b(h,k)|, where

b(h,k)=h^2+k^2-r^2=c(h,k)c^*(h,k)

c^*(h,k)=\sqrt{h^2+k^2}+r

(To foreshadow the key point, note that b(h,k) is not simply the square of c(h,k).) In either case, it is convenient to define the sum of these signed errors– that is, without the absolute value– for the two candidate pixels as

e_b(x,y)=b(x-1,y+1)+b(x,y+1)

e_c(x,y)=c(x-1,y+1)+c(x,y+1)

where in the source code above, the error variable maintains the value of e_b(x,y), which is initialized to e_b(r,0)=3-2r, and at each iteration, depending on the choice of next pixel to render (more on this shortly), can be updated accordingly with simple integer increment, bit shift, and addition operations as shown.

Choosing the closest pixel

To decide which pixel to move to at each iteration, note that both the integer-valued b(h,k) and the exact c(h,k) have the property that they are positive for points (h,k) outside the circle, and negative for points inside the circle. Thus, for example, in the extreme (and in fact impossible) case that both of the candidate next points (x-1,y+1) and (x,y+1) were outside the circle, then e_b(x,y) and e_c(x,y) are positive, indicating that we should move diagonally left to (x-1,y+1). Similarly, if both candidate points were inside the circle, then the sum of errors is negative, indicating that we should move up to (x,y+1).

The important question is what happens in between, i.e., in the typical situation shown in the above figure, where the two candidate points straddle the circle, so to speak. Again, ideally we would use the sign of the exact sum of errors e_c(x,y), in the same way as described above, moving diagonally left if e_c(x,y)>0, or up if e_c(x,y)<0. Does Bresenham’s algorithm, using the sign of the approximated integer-valued e_b(x,y) instead, yield the same behavior?

To see why this is a question worth asking, suppose for example that we want to draw a circle with radius r=4, and that for some reason we are asked which of the points (2,2) or (4,3) is closer to the circle, as shown in the following figure.

Example choosing which of points (2,2) or (4,3) is closer to the circle with radius 4.

The reader can verify that (4,3) is closer, with |c(4,3)|<|c(2,2)| and thus c(2,2)+c(4,3)<0 … but |b(4,3)|>|b(2,2)| and thus b(2,2)+b(4,3)>0. That is, the integer error approximation used in Bresenham’s algorithm is not monotonic in the actual distance of points from the circle.

The interesting good news is that this seemingly potential flaw, isn’t. That is, we are now in a position to conjecture the following:

Conjecture: At each iteration of Bresenham’s circle algorithm, the sum-of-error functions e_c(x,y) and e_b(x,y) always have the same sign. That is, the algorithm always makes the correct choice of which of the two candidate next pixels is closest to the circle.

Incomplete proof: There are a few key observations that will be useful here. First, at each iteration, c(x,y+1)>c(x-1,y+1) and c^*(x,y+1)>c^*(x-1,y+1)>0. We can see these directly from the definitions, noting that r \geq x>0. (This is why we constrained the radius to be positive, treating r=0 as a separate special case.)

Also, note that the exact sum of errors e_c(x,y) is always irrational, and thus never zero. That is, we should never be indifferent to the choice of next pixel. The approximated e_b(x,y) is never zero, either, as can be seen by inspecting the source code: the value of error is initialized to an odd number, and is always incremented or decremented by even amounts. Thus, it suffices to show that e_c(x,y) and e_b(x,y) are either both positive, or both negative.

So, there are two cases. First, suppose that e_c(x,y)>0. Then c(x,y+1)>0, since

2c(x,y+1)>c(x,y+1)+c(x-1,y+1)=e_c(x,y)>0

Thus,

e_b(x,y)=b(x-1,y+1)+b(x,y+1)

= c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x,y+1)

> c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x-1,y+1)

= e_c(x,y)c^*(x-1,y+1)>0.

Thus, when we are supposed to move diagonally left, we do.

The second case, showing that e_c(x,y)<0 implies e_b(x,y)<0, is why this is only conjecture and not a theorem. Following is as far as I have been able to proceed:

Similar to the first case, note that c(x-1,y+1)<0, since

2c(x-1,y+1)<c(x,y+1)+c(x-1,y+1)=e_c(x,y)<0

Consider again the expanded form of

e_b(x,y)=b(x-1,y+1)+b(x,y+1)

= c(x-1,y+1)c^*(x-1,y+1)+c(x,y+1)c^*(x,y+1)

If c(x,y+1) \leq 0, then we can see e_b(x,y)<0 directly. Otherwise, if c(x,y+1)>0 … and here is where I get stuck.

Conclusion

Well, that was disappointing.

My glass is half full, though; I think there are three possible resolutions of this problem, any of which would be an interesting outcome. First, and most likely, I may have simply missed something obvious. Perhaps a reader can help take the above attempt at a proof to the finish line.

Alternatively, maybe the conjecture is actually false, and there is in fact a counterexample– which given the partial proof above, would necessarily be a situation where Bresenham’s algorithm is supposed to move up (e_c(x,y)<0), but moves diagonally left instead (e_b(x,y)>0). In realistic practice, this isn’t a concern: I have verified numerically that the algorithm works correctly for every radius up to r=16384— that is, much larger than any pixel-perfect circle that would fit on the highest resolution screens currently available. But even from a theoretical perspective, if there is a counterexample, it would be interesting that a minimal one must be so surprisingly large.

Or finally, maybe the conjecture is indeed true, but the remainder of the proof of the second “negative” case is actually really hard. There is some intuition behind this possibility as well: in this part of the proof, we are trying to bound a sum of square roots, which is one of my favorite examples of “problems that seem like they have no business being as difficult as they are.” (The Euclidean traveling salesman is perhaps the most recognizable decision problem with this wrinkle: it’s often claimed to be NP-complete, but it’s “only” NP-hard; we don’t even know if it’s in NP, since it involves a similar comparison with a sum of square roots.) This potentially affects the integrity of my brute-force numerical verification above as well: it’s technically possible that I missed a counterexample by performing those thousands of sign checks with “only” 50-digit precision.

Edit 2022-03-21: Thanks to commenters Andrey Khalyavin for the initial proof of the contrapositive of the missing half above, which I didn’t fully grasp, and to Linus Hamilton for patiently following up with a more detailed version, linked here, that helped me to see how it works. Their work shows that Bresenham’s algorithm is indeed optimal, always selecting the unique next pixel that is closest to the desired circle.

Reference:

  1. Iverson, Brent, Hi-Res Circle Generator, Nibble Magazine, 9(1) January 1988, 68-71
Posted in Uncategorized | 13 Comments

String art

Introduction

This is a variant of a similar past problem: draw something interesting, using a sequence of joined straight line segments, without ever lifting your pen. Or in this case, with one continuous thread.

As far as I can tell, the first realization of this particular idea was in 2016, when artist Petros Vrellis [1] arranged 200 pins evenly spaced around a 28-inch-diameter circular rim, and wound kilometers of thread stretched in straight line chords from one pin to the next, in a carefully determined order, to yield an image similar to the one shown below (created by me).

My implementation of string art, with one continuous thread wound around 256 pins across 8,927 chords. Full size output image is 4096×4096 black on white, from 512×512 grayscale input (source Birsak et. al. [2]).

Vrellis has not published any details of their algorithm, but since then, numerous others have done similar work. The most common approach has been a sequential greedy algorithm, starting with a grayscale input image and a blank white output image, rendering each thread chord in the output as a pixel-approximated line segment, at each iteration selecting a next pin so that the resulting chord makes the most improvement to the result.

Most of these approaches had a quirk that didn’t appeal to me: they effectively treat the thread as if it were not actually opaque, so that overlapping thread chords increase the darkness of the corresponding pixels in the output image (or in some cases, reduce the “remaining” darkness of the pixels in the input as each new thread overlaps them). That is, dark thread on top of dark thread yields an even darker intersection.

The motivation for this post is to explore the alternative described in 2018 by Birsak et. al. [2], who instead upscaled the input image to a higher-resolution output, with each input pixel corresponding to a b \times b square block of pixels in the output. When one or more threads– each one output pixel wide, purely black-on-white– intersect a block, the fraction of those b^2 pixels that are black is a more accurate approximation of the gray level that we can compare with the corresponding single input pixel.

I thought it would be interesting to implement just this upscaling idea, but to otherwise skip the complexity of the other aspects of their algorithm (involving iterative solving of an integer programming problem, extracting an Eulerian trail from the resulting unordered set of chords, weighting more important regions of the target image, etc.), sticking to the simple sequential greedy approach of selecting consecutive chords. I was pleased with the results: less than 200 lines of C++ code (available on GitHub), that generates nice-looking images in just a few minutes.

Implementation

The program takes four arguments: the input image filename, the upscaling block size, the number of pins, and the output image filename. For example, to use 16×16-pixel output blocks per input pixel, with 256 pins:

string_art input.pgm 16 256 output.pgm

Image files use the easy-to-parse Netpbm format, specifically the 8-bit grayscale binary “P5” format.

Initially, the output image “canvas” is all white. Starting at an arbitrary pin, we iteratively select the best next pin to wind the thread across. For each candidate chord, we use Bresenham’s algorithm to visit each pixel on the chord. The score for that chord is simply the average– over each of the intersected blocks that would change as a result– of the difference between the target gray level of the input pixel and the (potential) new fraction of black pixels in the block. Intuitively, we measure how important it is to continue to darken the intersected blocks of pixels.

Evaluating the score for each possible next chord, we select the one with maximum score, that is, that makes the largest average improvement to the blocks it would modify. We stop when that best average improvement is negative, that is, when we would make the image worse by continuing.

Results

The figure below shows the results for several example images. The top row is the 512×512 grayscale input (source Birsak [2]). The middle row is the Birsak algorithm output, using a block size b=8 (and thus a 4096×4096 output resolution) and 256 pins. The bottom row is my implementation using the same settings. In each case, the label indicates the number of thread chords, and the total length of thread required (for a 630-mm diameter canvas).

String art comparison between (a) top row input images, 512×512 grayscale; (b) middle row Birsak et. al., output, 4096×4096 with 8×8 blocks and 256 pins; and (c) bottom row showing my algorithm with same parameters. Labels indicate the number of chords and total thread length in meters (on 630-mm diameter canvas).

For my own project, I used the same 256 pins, but a 384×384 input image, and a higher resolution block size b=16, for a 6144×6144 output image, which looks nice as a 24-inch, 300-dpi print, with the target input image inset as shown below.

Have you ever tried to photograph a cat?

References:

  1. Vrellis, Petros, A new way to knit (2016). http://artof01.com/vrellis/works/knit.html
  2. Birsak, M., Rist, F., Wonka, P., & Musialski, P., String Art: Towards Computational Fabrication of String Images, Computer Graphics Forum, 37(2) 2018, 263–274, doi:10.1111/cgf.13359
Posted in Uncategorized | 3 Comments

There is no equilux

I learned a new word recently… sort of. An equilux is a day on which the durations of daylight and night are exactly equal. Contrast this with the more familiar equinox, or the time at which the center of the Sun passes through the Earth’s equatorial plane, so that the Sun is directly above the Earth’s equator. (If anyone is checking, the official definition of the equinox is slightly different than this, for technical reasons that we don’t really care about here.)

There are two equinoxes each year; for example, this coming year the first equinox of 2022 is on 20 March, at about 15:33:25 UTC, marking the unofficial start of spring in the northern hemisphere (or fall in the southern hemisphere).

We think of an equinox as also being the time at which there are roughly equal periods of daylight and darkness. The Wikipedia page linked above does a good job of explaining why this isn’t quite true: the Sun appears as not just a point but a disk, a sliver of which becomes visible at sunrise well before the center of the Sun appears above the horizon, and similarly a sliver remains visible in the evening after the center has dipped back below the horizon again.

Thus, even on the equinox, the day is slightly longer than the night. So, when is the “equilux,” where these periods of day and night are truly equal?

To see why this is tricky to answer, consider the figure below, showing the changes in length of day and night over the course of the coming year in the Washington, D.C., area:

Length of day and night in Washington, D.C., over the course of the year 2022. The inset is a zoomed-in view of the week near the spring equinox on 20 March.

This winter, the nights (shown in blue) are much longer than the days (shown in red). But over the next few months, the nights get shorter and the days get longer… until the middle of March, when we start to see more daylight hours than nighttime. This “crossing” happens around 16 March (see the zoomed-in view in the inset), when we see almost exactly 12 hours (plus a few seconds) of night beginning at sunset that evening, followed by about 12 hours– plus almost a full minute— of daylight on 17 March.

The motivation for this post is to observe that (1) this is still not exactly equal periods of daylight and darkness, still differing by nearly a minute; and that (2) such an exact equilux will almost surely never occur. An equinox is a precise instant in time– and the same instant independent of observer location on the Earth– that we can in principle measure as accurately as we wish. But an equilux, on the other hand, is two consecutive intervals of time, whose lengths are varying over the weeks surrounding the equinox by several minutes per day. Hoping that those lengths will meet in the middle exactly is hoping for a measure zero event.

Having said that, we can relax our definition somewhat, and consider when periods of daylight and darkness are “most closely equal,” minimizing the absolute difference between consecutive intervals as we did above. Things are still pretty unpleasant, for a couple of reasons. First, the times, and thus durations, of daylight and darkness depend on the location of the observer. The calculations and figure above are unique to Washington, D.C.; if we instead consider southern Virginia (or Oklahoma, or most of California), the nearest equilux is earlier, consisting of the daylight hours of 16 March followed by that night, as shown in the figure below.

Date of (spring) equilux across the globe. Equilux is earlier than the spring equinox (20 March) in the northern hemisphere, later in the southern hemisphere. Lighter colors indicate the equilux starts with daytime (sunrise to sunset) of the indicated date, followed by nighttime; darker colors indicate starting with evening of the date, followed by daytime of the following day.

I struggled a bit to make this figure clear and informative at the same time. In the northern hemisphere, the spring equilux is before the 20 March equinox, by a number of days indicated by the color. For the southern hemisphere, I “re-used” the same color palette, indicating the number of days after the equinox that the equilux occurs. In either case, the lighter color indicates starting with daytime (sunrise to sunset) of the indicated date, followed by the roughly equal nighttime; the darker color indicates that the equilux starts with evening of the indicated date, followed by daytime of the following day.

The light purple and green bands near the equator more coarsely indicate equiluxes that are more than a week before or after the equinox, respectively. Which brings us to the second problem with everything we’ve done so far: an equilux depends on how we choose to define the transition between what we consider “daylight” versus “darkness.” The sunrise and sunset used above are defined as the times when the center of the Sun’s disk is 50 arcminutes below the horizon. It’s arguably way too light at these times to call them a transition to or from “darkness.” A more reasonable transition would be at least “dawn” and “dusk,” defined as civil twilight (6 degrees below the horizon), if not nautical or even astronomical twilight (12 or 18 degrees, respectively). The figure below compares the sunrise/sunset equilux (shown in red) with the dawn/dusk civil twilight assumption (shown in blue), as a function of the observer’s latitude.

Date of equilux vs. observer latitude, assuming “day/night” transitions at sunrise/sunset (in red) or dawn/dusk twilight (in blue). The equinox on 20 March is shown in black for reference.

So, depending on how dark we think it needs to be to qualify as “not daylight,” the spring equilux in the mid-northern latitudes where I live could be as late as a few days before the equinox, or as early as mid-February. And however we choose to define it, the duration of daylight and darkness will still differ by a minute or three.

Posted in Uncategorized | Leave a comment

Squid Game expectation

Warning: This post is motivated by the Netflix series Squid Game, and contains significant spoilers. I just recently watched the show, and really enjoyed it, so if you haven’t yet seen it, you may want to skip this article.

Suppose that you are one of n=16 competitors who must cross a bridge of s=18 steps. The steps are far enough apart that you must jump from one to the next… and at each step you must choose which of two side-by-side glass squares to jump onto: one square is tempered glass that is strong enough for you to safely jump onto, while the other square is normal glass that will break if you step on it, causing you to fall to your death and out of the competition. The glass squares are otherwise indistinguishable, so that players must guess which square to jump onto at each new step.

Each player in turn attempts to cross the bridge. The first player has a tough road ahead, safely crossing with a probability of only 1/2^{18}. But each subsequent player observes the progress of all previous players, noting which squares at each “explored” step are safe and which are deadly.

Last week’s FiveThirtyEight Riddler column asked for the expected number Y of surviving players. Let’s turn this around, and instead consider the expected number X=n-Y of eliminated players, since the analysis and resulting formulas turn out a bit nicer that way.

I think this is a nice problem for several reasons. It can be an engaging problem for students as a, uh, “real-world application” of probability and expected value. It’s easy to simulate as a programming exercise; and there are several analytical approaches to solving the problem as well. The Riddler solutions describe a recurrence relation, as well as a relatively nice summation:

E[X] = n - \frac{1}{2^{s}}\sum\limits_{k=0}^n (n-k){s \choose k}

yielding 589819/65536, or 8.9999237060546875 players eliminated on average.

But the fact that the expected value is so close to 9– so close to half the number of steps– is not a coincidence. This seems like a great example application of linearity of expectation, that in this case allows us to solve the problem with just paper and pencil.

Let X_1, X_2, \ldots, X_s be indicator random variables, each equal to 1 if and only if some player is eliminated at the corresponding step, so that X=\sum X_j, and thus

E[X] = \sum\limits_{j=1}^s E[X_j] = \sum\limits_{j=1}^s P(X_j=1) .

The key observation is that “most” of these probabilities are exactly 1/2. More precisely, as long as j \leq n, someone must encounter step j, and whoever does encounter that step for the first time will be eliminated with probability P(X_j=1)=1/2.

So if there are at least as many players as steps, i.e. n \geq s, then the expected number of players eliminated is exactly E[X]=s/2.

In the actual game, however, there are two “extra” steps. As described above, the first 16 steps contribute exactly half a player each to the total expected number eliminated. But the 17th and 18th steps require more careful analysis, because it is possible that one or both of these steps may not be encountered at all.

For example, consider the worst possible outcome, where the first player is eliminated on the first step, the second player on the second step, and so on, with the 16th and final player being eliminated on the 16th step, so that the 17th (and 18th) step is never reached. This happens with probability 1/2^{16}, and so

P(X_{17}=1) = (1-\frac{1}{2^{16}})(\frac{1}{2})

We handle the 18th step similarly. In addition to the above situation where neither the 17th nor 18th step is reached, there are 16 equally likely ways for the last player to be eliminated on the 17th step (according to which one of the first 16 steps is guessed correctly by some player), so that

P(X_{18}=1) = (1-\frac{1}{2^{16}}-\frac{16}{2^{17}})(\frac{1}{2})

Adding to the 16/2=8 expected eliminations from the first 16 steps yields the desired E[X]=589819/65536.

Granted, it is fortunate that there are only two more steps than players, because the resulting general formula for the expected number of players eliminated:

E[X] = \frac{1}{2}\sum\limits_{j=1}^s (1-\frac{1}{2^{j-1}}\sum\limits_{k=n}^{j-1}{j-1 \choose k})

looks more complicated than the summation presented earlier.

Posted in Uncategorized | Leave a comment

Code name generator

Several months ago I updated the list of words and frequencies of occurrence that I’ve used in various natural language processing experiments (keyword search “ngrams”) over the years, to reflect last year’s update to the Google Books Ngrams dataset.

This past weekend I updated it again, to include the words in the WordNet parts of speech database developed by Princeton University, restricting to just those 63,745 terms consisting of single words of all lowercase letters, i.e., no capitalization, spaces, hyphens, or other punctuation.

Most– over 98%– of these words were already in the list. But including them lets us also use their categorization into parts of speech (nouns, verbs, adjectives, and adverbs), to automate the generation of random code names for your next super secret project.

Just select a random adjective, followed by a random noun, and you get Operation ALIEN WORK, or DISSENTING CITY, or one of my favorites, OCCASIONAL ENEMY. Weighting the random selections by word frequency keeps the code names reasonably simple, so that you don’t end up with Operation PARABOLOIDAL IMMIGRATION (although an unweighted sample did yield Operation YONDER KING, which sounds pretty cool).

The Python code name generator and word lists of parts of speech and corresponding frequencies are available on GitHub.

References:

  1. Google Books Ngram Viewer Exports, English versions 20120701 and 20200217
  2. Princeton University “About WordNet.” WordNet. Princeton University. 2011.
Posted in Uncategorized | 3 Comments

Expected length of a soccer penalty shootout

The U.S. women’s soccer team recently beat the Netherlands in a penalty shootout, soccer’s version of an overtime tie-breaker. Teams alternate turns attempting a penalty kick, resulting in either a scored point or a block/miss, and after each team has had five attempts, the most points wins… with two twists:

  1. The shootout ends as soon as the outcome is guaranteed, so that the shootout may require fewer than ten kicks in total. This happened in the U.S. vs. Netherlands game: the U.S. blocked two of the first four kicks from the Netherlands, and so after making all of their first four shots, the U.S. victory was assured after just eight total shots.
  2. If the score is still tied after each team has had five shots, the shootout continues until “sudden death,” with teams alternating penalty kicks until one team scores and the other misses in the same “round.” This could go on for a while: the current world record for the longest shootout in a first class match is 48 shots (the ten initial shots, plus 19 more rounds of sudden death) in the 2005 Namibian Cup.

A recent Riddler column at FiveThirtyEight asked how long this takes on average. That is, assuming that each penalty kick by either team scores independently with the same probability p, what is the expected total number of shots taken?

This post is mostly just to capture my notes on this problem, adding to my “library” of potentially interesting future exercises for students. Following is a Python implementation that uses the symbolic mathematics library SymPy (bundled with SciPy) to recursively compute the desired expected value as a function of p:

from functools import lru_cache
import sympy as sym

@lru_cache(maxsize=None)
def shots(score, us, them, p):
    """E[# shots] given net score and remaining shots for us/them."""
    if us + them == 0:
        return 0 if score != 0 else 2 / (2 * p * (1 - p))
    if score + us < 0 or score - them > 0:
        return 0
    return (1 + p * shots(-score - 1, them, us - 1, p) +
            (1 - p) * shots(-score, them, us - 1, p))

print(shots(0, 5, 5, sym.Symbol('p')).simplify())

The early stopping condition in line 9 is simple to describe: we stop if the next team to shoot can’t possibly win, or can’t possibly lose.

I like this problem because there are several nice subproblems buried inside. For example, with a bit of work, we could consolidate the us and them function arguments– indicating the number of possible remaining shots for the next team to kick and the defending team, respectively– into a single argument indicating the total number of shots remaining before sudden death. (It’s a fair question whether this is really any clearer or more readable, though; I think the above implementation makes it easier to see how the kicking and defending teams switch places in the recursion.)

Also, as noted in the Riddler column, it’s interesting, and a nice exercise to prove, that the expected number of shots is symmetric in p and 1-p, as shown in the figure below:

In other words, for example, if teams score penalty kicks with probability 0.8, we should expect shootouts to take just as long on average as if teams score with probability 0.2; and shootouts are shortest when each kick is a fair coin flip.

Finally, instead of just the expected number of shots, consider computing the entire probability distribution. Or even just counting outcomes (that are not in general equally likely): for example, how many different outcomes of the first n=5 “guaranteed” rounds (i.e., the first ten shots) are there that result in a tie leading to sudden death? This number is {2n \choose n}; it’s a nice problem to find a counting argument to show this (e.g., ten misses is one such outcome, while ten scored points is another, etc.).

Posted in Uncategorized | Leave a comment

Counting collisions

Here is an interesting problem that I saw recently, that involves a nice combination of physics, programming, and mathematics, with a surprising solution.

Imagine two balls on a frictionless floor near a wall, as shown in the figure below. The green ball is initially at rest between the wall and the blue ball, which is m times more massive and moving toward the green ball. Assume everything is nicely Newtonian, and collisions are perfectly elastic.

As a function of m, how many collisions are there?

For example, if m=1, so that the two balls have equal mass, then there are three collisions: the blue ball strikes the green ball, which bounces off the wall, then returns to strike the blue ball again, sending it off to the right.

However, if the blue ball is heavier, say m=100 times heavier, then the green ball will bounce back and forth many times between the wall and the blue ball, for a total of 31 collisions, before finally failing to catch up to the blue ball as it moves off to the right.

What if the blue ball is m=10,000 times heavier? Or one million times?

Following is my Python implementation of a solution.

def collisions(m):
    v1, v2 = 0, -1
    bounce = 0
    while v1 > v2:
        v1, v2 = (((1 - m)*v1 + 2*m*v2) / (1 + m),
                  (2*v1 - (1 - m)*v2) / (1 + m))
        bounce = bounce + 1
        if v1 < 0:
            v1 = -v1
            bounce = bounce + 1
    return bounce

print(collisions(10000))
Posted in Uncategorized | 2 Comments

Beware the “natural” quaternion

Introduction

Rotation math can be confusing. But it didn’t need to be this confusing.

I think the reason that 3D rotations can be tricky to work with is that there are so many choices of convention– in interpretation, notation, and implementation– and we often do not communicate or document all of those choices clearly and completely. Are we “actively” rotating vectors, or “passively” rotating frames? If the latter, are we rotating from the “body” to “world” frame, or from world to body? Are rotations applied as we write them left to right, or right to left? Are the rotations right-handed or left-handed? Etc.

This post is motivated by lost and delayed productivity, stemming from confusion and miscommunication about one of these choices that seems not widely known: there are two different quaternions out there in the wild. The unit quaternion, familiar to most of us as a way to represent rotations in many application domains, has an evil twin. If you didn’t already know this, then hopefully this post is a helpful warning to double-check your documentation and code. But even if this is old news for you, then I will try to argue that use of this evil twin quaternion, advocated by Shuster [1] as the “natural” convention, was an unnecessary choice… and certainly not any more “natural,” “physical,” or otherwise inherently preferable to the quaternion most of us are familiar with.

Rotation matrices

To try to show more clearly how silly this situation is, let’s think like a software developer, working out how to implement rotations for some application. Let’s start with the interface first, then we will flesh out the implementation. We can view the concept of rotation as a group acting on \mathbb{R}^3, with a particular rotation represented as a group element g, that transforms a vector v \in \mathbb{R}^3 into a new rotated vector g(v). For example, in Python, we could make a Rotation object callable:

class Rotation:
    ...
    
    def __call__(self, v):
        """Rotate vector v."""
        return ...

g = Rotation(...)
v = (1, 0, 0)
v_rotated = g(v)

But wait– we’ve already made several choices of convention here, at least implicitly. First, from this software implementation perspective, we are bypassing the discussion of interpretation altogether. That is, for example, whether the user wants to interpret an argument vector v \in \mathbb{R}^3 as being “actively” rotated within a fixed coordinate frame, or as a fixed point in space whose coordinates are being “passively” rotated from one frame to another, does not affect the eventual lines of code implementing the arithmetic operations realizing the rotation. (Or put another way, those interpretation details get pushed into the constructor, affecting how we want to interpret the parameters specifying creation of a Rotation. More on this later.)

However, we have made a definite choice about notation: namely, the left group action convention, where we write (from left to right) the rotation group element g first, followed by the vector v being acted upon.

This is okay; no one is complaining about this choice of convention. For example, if we now consider actually implementing a rotation as a 3×3 matrix R (we’ll get to quaternions shortly), pretty much everyone agrees with and has settled on the well-established convention of multiplying the matrix R on the left by the column vector v on the right, so that the group action g(v) corresponds to the matrix multiplication Rv:

import numpy as np

class Rotation:
    def __init__(self, matrix):
        """Create rotation from matrix."""
        self.r = np.array(matrix)
    
    def __call__(self, v):
        """Rotate vector v."""
        return self.r @ np.array(v) # matrix multiplication

g = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
v = (1, 0, 0)
v_rotated = g(v)

(As an aside, the only reasonably common application context I can think of where it’s more common to multiply a row vector on the left by a matrix on the right is a Markov chain, where we update the probability distribution as a row vector multiplied by the transition matrix. Maybe there are others that I’m not aware of?)

Composition of rotations

Our implementation isn’t quite complete. We still need to work out how to compose multiple rotations. Fortunately, there is again community-wide agreement on how this works with rotation matrices, and it’s consistent with the same left group action convention. Namely, composition (just like the group action on vectors) is denoted and implemented as the usual left-to-right matrix multiplication:

import numpy as np

class Rotation:
    def __init__(self, matrix):
        """Create rotation from matrix."""
        self.r = np.array(matrix)
    
    def __call__(self, v):
        """Rotate vector v."""
        return self.r @ np.array(v)

    def __matmul__(self, r2):
        """Compose rotations (this @ r2)."""
        return Rotation(self.r @ r2.r)

r1 = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
r2 = Rotation(((0, 0, 1), (0, 1, 0), (-1, 0, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result

Note a couple of observations before we move on. First, the example above highlights a potentially confusing side effect of this left group action convention: in the composition R_1 R_2, the rightmost rotation R_2 is the first to act on the input vector v.

Second, recall that matrix multiplication is defined so that each element in the matrix product C=AB is computed as c_{i,j}=\sum_k a_{i,k}b_{k,j}. This is technically another arbitrary choice of convention; we could in principle choose to abandon this convention, and define “our” multiplication as c_{i,j}=\sum_k a_{k,i}b_{j,k}. The reader can verify that the effect of changing this convention is simply to reverse the order in which we would write a “historically conventional” well-defined matrix product.

Maybe we’ve been using the “wrong” definition of matrix multiplication for the last two centuries or so, and the above redefinition is more “natural”?

Keep the interface, change the implementation

Now that we have a working software implementation using rotation matrices, that corresponds nicely with cocktail-napkin notation, suppose that we want to change that implementation to use quaternions instead of matrices under the hood… but without changing the interface exposed to the user.

First, let’s review how unit quaternions work as rotation operators. Just as the group SO(3) of rotation matrices acts on vectors by multiplication, the group SU(2) of unit quaternions of the form q=w+x\mathbf{i}+y\mathbf{j}+z\mathbf{k} acts on vectors by conjugation, with a unit quaternion q transforming a vector v into the rotated vector qvq^{-1}=qvq^{*}, where we temporarily convert vectors in \mathbb{R}^3 to and from corresponding quaternions with zero real part, and the non-commutative quaternion multiplication is defined by

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1

Note that this form of conjugation, qvq^{-1}, is also a choice of convention. We could have used q^{-1}vq instead… but here again, most everyone seems to agree the above is the more sensible choice, since it keeps us consistent with the left group action notational convention, allowing us to comfortably switch back and forth between thinking in terms of rotation matrices or quaternions as our underlying implementation.

Here is the result:

import numpy as np

def quat_product(q1, q2):
    """Return quaternion product (q1 q2)."""
    w1, xyz1 = q1
    w2, xyz2 = q2
    return (w1 * w2 - np.dot(xyz1, xyz2),
            w1 * xyz2 + w2 * xyz1 + np.cross(xyz1, xyz2))

class Rotation:
    def __init__(self, quat):
        """Create rotation from quaternion."""
        self.q = (quat[0], np.array(quat[1]))

    def __call__(self, v):
        """Rotate vector v."""
        w, xyz = quat_product(quat_product(self.q, (0, np.array(v))),
                              (self.q[0], -self.q[1]))
        return xyz

    def __matmul__(self, r2):
        """Compose rotations (this @ r2)."""
        return Rotation(quat_product(self.q, r2.q))

c = np.sqrt(0.5)
r1 = Rotation((c, (0, 0, c)))
r2 = Rotation((c, (0, c, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result

Note that I’ve pulled out a helper function, quat_product, that multiplies two quaternions, using the product as defined above and originally set down by Hamilton nearly two centuries ago.

But this is yet another technically arbitrary choice of convention. There is another definition of the quaternion product, advocated by Shuster [1], and in wide use by the spacecraft dynamics community, that changes the product relations from the “Hamilton convention” of

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1

to what Sommer et. al. [2] call the “Shuster convention” (sometimes referred to as the “JPL convention,” denoted here as the operator \otimes) of

\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = -1, \mathbf{i}\otimes\mathbf{j}\otimes\mathbf{k} = +1

Mathematically speaking, there is nothing wrong with this. (Side puzzle: this doesn’t contradict the Frobenius theorem, right?) However– and finally we come to the point of this discussion– there is also nothing more inherently Right, or Natural, about it, either.

The effect, and indeed the motivation, of using Shuster’s convention is simply to reverse the order of the quaternion product. That is, given two quaternions q_1,q_2 with Hamilton product denoted by juxtaposition q_1 q_2, the reader can verify that their “Shuster product” q_1 \otimes q_2 = q_2 q_1.

In the quat_product implementation above, we could realize the Shuster product by flipping the plus sign on the cross product to a minus sign… and that’s it. Nothing else changes… except, possibly, how we might choose to interpret the parameterization of our rotations in the constructor.

Homomorphism vs. antihomomorphism

So, why mess with two centuries of notational inertia just to write products in reverse order? To see what I think is the root of the problem, note that I cheated somewhat in the quaternion implementation of the Rotation class above. I claimed not to change the interface at all, but I did: the original rotation matrix implementation’s constructor expected a matrix argument, while the quaternion version expects a quaternion argument.

One way to fix this would be to keep the original expectation of a matrix as the constructor argument, and convert it to the corresponding quaternion under the hood. Or conversely, in the matrix implementation, we would accept a quaternion as the constructor argument, and convert it to the corresponding matrix.

But what do we mean by conversion to the corresponding matrix (or quaternion)? We want the external interface to remain unchanged, so that we preserve behavior as observed by the user, no matter which internal implementation we use. In mathematical terms, we need a homomorphism: a conversion function \phi : SU(2) \to SO(3), with the property that for all unit quaternions q_1, q_2, converting them to rotation matrices preserves the behavior of composition, that is, \phi(q_1 q_2)=\phi(q_1)\phi(q_2).

So, consider the situation at this point. We have two choices of convention to make:

  1. What is the quaternion product: do we use Hamilton’s that has been around for nearly two centuries and is in common use in mathematical literature and software libraries, or Shuster’s that is mathematically equally workable but much more recent and much less widely known?
  2. What is the conversion function \phi described above?

The key observation is that, if we want our conversion function \phi to be a homomorphism, then fixing the choice of convention in either (1) or (2) determines the choice in the other. And this whole mess, that on the surface is Shuster’s “natural” quaternion product as choice (1), really stems from the prematurely fixed, equally arbitrary choice (2) for the correspondence between quaternions and matrices.

In other words, the complaint seems to be, “I’ve already fixed my assumed conversion function for some reason, and I want it to be a homomorphism, so I need the quaternion product definition to change accordingly.” But if we simply transpose Shuster’s assumed quaternion-to-matrix conversion– which he refers to in equation (4) of [1], but does not actually define until equation (61), without ever noting the implicit conventional choice in doing so– then the Hamilton product does indeed “satisfy… the natural order” of multiplication as desired.

Conclusion

Unfortunately, the damage is done. That is, rants like this aren’t going to fix all of the papers and lines of code that already exist that use both conventions, so we can do no better than double-check assumptions and conventions whenever we encounter a new paper, software library, etc.

There are even examples of more intermediate weirdness, so to speak, to be on the lookout for. For example, MATLAB’s Robotics Toolbox is reasonably well-behaved, consistently using the Hamilton product (robotics.utils.internal.quatMultiply) and the corresponding homomorphic conversion functions (quat2rotm and rotm2quat)… but in the Aerospace Toolbox, despite also using the Hamilton product (quatmultiply), the analogous conversion functions (quat2dcm and dcm2quat) are seemingly antihomomorphic– i.e., quat2dcm(q) returns the transpose of quat2rotm(q)— except for the added wrinkle that the provided rotation function quatrotate uses the conjugation q^{-1}vq consistent with the right group action convention (i.e., so that when composing rotations, matrix products look reversed compared to their corresponding quaternion products).

References:

  1. Shuster, M., The nature of the quaternion, The Journal of the Astronautical Sciences, 56(3) 2008, 359–373
  2. Sommer, Gilitschenski, Bloesch, Weiss, Siegwart, Nieto, Why and How to Avoid the Flipped Quaternion Multiplication, arXiv:1801.07478
Posted in Uncategorized | 14 Comments