## allRGB: Hilbert curves and random spanning trees

Introduction

Consider the following problem: create a visually appealing digital image with exactly one pixel for each of the 16.8 million possible colors in the 24-bit RGB color space.  Not a single color missing, and no color appearing more than once.

Check out the web site allRGB.com, which hosts dozens of such contributed images, illustrating all sorts of interesting approaches.  Most of them involve coloring the pixels in some systematic way, although a few start with an existing image or photograph, and transform it so that it contains all possible colors, while still “looking like” the original image.  I’ll come back to this latter idea later.

Hilbert’s curve

The motivation for this post is two algorithms that lend themselves naturally to this problem.  The first is the generation of Hilbert‘s space-filling curve.  The idea fascinated me when I first encountered it, in the May 1985 issue of the short-lived Enter magazine.  How could the following code, that was so short, if indecipherable (by me, anyway), produce something so beautifully intricate?

 10 PI = 3.14159265: HCOLOR= 3
20  HOME : VTAB 21
30  PRINT "HILBERT'S CURVE."
40  INPUT "WHAT SIZE? ";F1
50  INPUT "WHAT LEVEL? ";L
60  HGR
70 PX = 0:PY =  - 60
80 H = 0:P = 1
90  GOSUB 110
100  GOTO 20
110  IF L = 0 THEN  RETURN
120 L = L - 1:L1 = P * 90
130 H = H + L1:P = ( - P)
140  GOSUB 110
150 P = ( - P)
160  GOSUB 280
170 R1 = P * 90:H = H - R1
180  GOSUB 110
190  GOSUB 280
200  GOSUB 110
210 R1 = P * 90:H = H - R1
220  GOSUB 280
230 P = ( - P)
240  GOSUB 110
250 P = ( - P):L1 = P * 90
260 H = H + L1:L = L + 1
270  RETURN
280 HX =  COS (H * PI / 180)
290 HY =  SIN (H * PI / 180)
300 NX = PX + HX * F1
310 NY = PY + HY * F1
320  IF NX > 139 OR NY > 79 THEN GOTO 370
330  HPLOT 140 + PX,80 - PY TO 140 + NX,80 - NY
340 PX = NX:PY = NY
350 PY = NY
360  RETURN
370  PRINT "DESIGN TOO LARGE"
380  FOR I = 1 TO 1000: NEXT I
390  GOTO 20


Nearly 30 years later, my first thought when encountering the allRGB problem was to use the Hilbert curve, but use it twice in parallel: traverse the pixels of the image via a 2-dimensional (order 12) Hilbert curve, while at the same time traversing the RGB color cube via a 3-dimensional (order 8) Hilbert curve, assigning each pixel in turn the corresponding color.  The result looks like this (or see this video showing the image as it is constructed pixel-by-pixel):

allRGB image created by traversing image in 2D Hilbert curve order, assigning colors from the RGB cube in 3D Hilbert curve order.

(Aside: all of the images and videos shown here are reduced in size, using all 262,144 18-bit colors, for a more manageable 512×512-pixel image size.)

As usual and as expected, this was not a new idea.  Aldo Cortesi provides an excellent detailed write-up and source code for creating a similar image, with a full-size submission on the allRGB site.

However, the mathematician in me prefers a slightly different implementation.  Cortesi’s code is based on algorithms in the paper by Chris Hamilton and Andrew Rau-Chaplin (see Reference (2) below), in which the computation of a point on a Hilbert curve depends on both the dimension n (i.e., filling a square, cube, etc.) and the order (i.e., depth of recursion) of the curve.  But we can remove this dependence on the order of the curve; it is possible to orient each successively larger-order curve so that it contains the smaller-order curves, effectively realizing a bijection between the non-negative integers $\mathbb{N}$ and the non-negative integer lattice $\mathbb{N}^n$.  Furthermore, it seems natural that the first $2^n$ points on the curve (i.e., the order 1 curve) should be visited in the “canonical” reflected Gray code order.

The result is the following Python code, with methods Hilbert.encode() and Hilbert.decode() for converting indices to and from coordinates of points on the n-dimensional Hilbert curve:

class Hilbert:
"""Multi-dimensional Hilbert space-filling curve.
"""

def __init__(self, n):
"""Create an n-dimensional Hilbert space-filling curve.
"""
self.n = n
self.mask = (1 << n) - 1

def encode(self, index):
"""Convert index to coordinates of a point on the Hilbert curve.
"""

# Compute base-n digits of index.
digits = []
while True:
index, digit = divmod(index, self.mask + 1)
digits.append(digit)
if index == 0:
break

# orientation of smaller order curves.
vertex, edge = (0, -len(digits) % self.n)

# Visit each base-n digit of index, most significant first.
coords = [0] * self.n
for digit in reversed(digits):

# Compute position in current hypercube, distributing the n
# bits across n coordinates.
bits = self.subcube_encode(digit, vertex, edge)
for bit in range(self.n):
coords[bit] = (coords[bit] << 1) | (bits & 1)
bits = bits >> 1

# Compute orientation of next sub-cube.
vertex, edge = self.rotate(digit, vertex, edge)
return tuple(coords)

def decode(self, coords):
"""Convert coordinates to index of a point on the Hilbert curve.
"""

# Convert n m-bit coordinates to m base-n digits.
coords = list(coords)
m = self.log2(max(coords)) + 1
digits = []
for i in range(m):
digit = 0
for bit in range(self.n - 1, -1, -1):
digit = (digit << 1) | (coords[bit] & 1)
coords[bit] = coords[bit] >> 1
digits.append(digit)

# orientation of smaller order curves.
vertex, edge = (0, -m % self.n)

# Visit each base-n digit, most significant first.
index = 0
for digit in reversed(digits):

# Compute index of position in current hypercube.
bits = self.subcube_decode(digit, vertex, edge)
index = (index << self.n) | bits

# Compute orientation of next sub-cube.
vertex, edge = self.rotate(bits, vertex, edge)
return index

def subcube_encode(self, index, vertex, edge):
h = self.gray_encode(index)
h = (h << (edge + 1)) | (h >> (self.n - edge - 1))
return (h & self.mask) ^ vertex

def subcube_decode(self, code, vertex, edge):
k = code ^ vertex
k = (k >> (edge + 1)) | (k << (self.n - edge - 1))

def rotate(self, index, vertex, edge):
v = self.subcube_encode(max((index - 1) & ~1, 0), vertex, edge)
w = self.subcube_encode(min((index + 1) | 1, self.mask), vertex, edge)
return (v, self.log2(v ^ w))

def gray_encode(self, index):
return index ^ (index >> 1)

def gray_decode(self, code):
index = code
while code > 0:
code = code >> 1
index = index ^ code
return index

def log2(self, x):
y = 0
while x > 1:
x = x >> 1
y = y + 1
return y


[Edit: The FXT algorithm library contains an implementation of the multi-dimensional Hilbert curve, with an accompanying write-up whose example output suggests that it uses exactly this same convention.]

Random spanning trees

At this point, a key observation is that the Hilbert curve image above is really just a special case of a more general approach: we are taking a “walk” through the pixels of the image, and another “walk” through the colors of the RGB cube, in parallel, assigning colors to pixels accordingly.  Those “walks” do not necessarily need to be along a Hilbert curve.  In general, any pair of “reasonably continuous” paths might yield an interesting image.

A Hilbert curve is an example of a Hamiltonian path through the corresponding graph, where the pixels are vertices and adjacent pixels are next to each other horizontally or vertically.  In general, Hamiltonian paths are hard to find, but in the case of the two- and three-dimensional grid graphs we are considering here, they are easier to come by.  The arguably simplest example is a “serpentine” traversal of the pixels or colors (see Cortesi’s illustration referred to as “zigzag” order here), but the visual results are not very interesting.  So, what else?

We’ll come back to Hamiltonian paths shortly.  But first, consider a “less continuous” traversal of the pixels and/or colors: construct a random spanning tree of the grid graph, and traverse the vertices in… well, any of several natural orders, such as breadth-first, depth-first, etc.

So how to construct a random spanning tree of a graph?  This is one of those algorithms that goes in the drawer labeled, “So cool because at first glance it seems like it has no business working as well as it does.”

First, pick a random starting vertex (this will be the root), and imagine taking a random walk on the graph, at each step choosing uniformly from among the current vertex’s neighbors.  Whenever you move to a new vertex for the first time, “mark” or add the traversed edge to the tree.  Once all vertices have been visited, the resulting marked edges form a spanning tree.  That’s it.  The cool part is that the resulting spanning tree has a uniform distribution among all possible trees.

But we can do even better.  David Wilson (see Reference (3) below) describes the following improved algorithm that has the same uniform distribution on its output, but typically runs much faster: instead of taking a single meandering random walk to eventually “cover” all of the vertices, consider looping over the vertices, in any order, and for each vertex (that is not already in the tree) take a loop-erased random walk from that vertex until reaching the first vertex already in the tree.  Add the vertices and edges in that walk to the tree, and repeat.  Despite this “backward” growing of the tree, the resulting distribution is still uniform.

Here is the implementation in Python, where a graph is represented as a dictionary mapping vertices to lists of adjacent vertices:

import random

def random_spanning_tree(graph):
"""Return uniform random spanning tree of undirected graph.
"""
root = random.choice(graph.keys())
parent = {root: None}
tree = set([root])
for vertex in graph:

# Take random walk from a vertex to the tree.
v = vertex
while v not in tree:
neighbor = random.choice(graph[v])
parent[v] = neighbor
v = neighbor

# Erase any loops in the random walk.
v = vertex
while v not in tree:
v = parent[v]
return parent


The following image shows one example of using this algorithm, visiting the image pixels in a breadth-first traversal of a random spanning tree, assigning colors according to a corresponding Hilbert curve traversal of the RGB color cube.

Breadth-first traversal of random spanning tree of pixels, assigning colors in Hilbert curve order.

My wife pointed out, correctly, I think, that watching the construction of these images is as interesting as the final result; this video shows another example of this same approach.  Note how the animation appears to slow down, as the number of pixels at a given depth in the tree increases.  A depth-first traversal, on the other hand, has a more constant speed, and a “snakier” look to it.

Expanding on this general idea, below is the result of all combinations of choices of four different types of paths through the image pixels and/or paths through the colors: Hilbert curve, serpentine, breadth-first through a random spanning tree, or depth-first.

Images resulting from different choices of paths through image pixels and colors.

Complete source code for generating larger versions of these images is available at the usual location here, in both Python, where I started, and C++, where I ended up, to manage the speed and space needed to create the full-size images.

Open questions

Let’s come back now to the earlier mention of (a) transforming an existing image to contain all possible colors, and (b) using Hamiltonian paths in some way.  Reference (1) below describes a method of constructing a Hamiltonian cycle through the pixels of an image, using a minimum spanning tree of a “contraction” of the associated grid graph.  The idea is to group the pixels into 2×2 square blocks of “mini-Hamiltonian-cycles,” resulting in a quarter-size grid graph, with edge weights between blocks corresponding to how “close” the pixel colors of the neighboring blocks are in the image.

Then, given a minimum spanning tree– or really, any spanning tree– in this smaller graph, we can construct a corresponding Hamiltonian cycle in the larger grid graph in a natural way, by connecting the 2×2 “mini-cycles” that are adjacent in the tree together into a larger cycle.  The resulting Hamiltonian cycle tends to have pixels that are close together on the cycle be “close” in color as well.

Now consider the following allRGB image: start with an existing image, and construct a Hamiltonian cycle from the minimum-weight “block” spanning tree.  Visit the pixels in cycle order, assigning colors according to some other corresponding “smooth” order, e.g. using a Hilbert curve.  How much of the original image would still be apparent in the result?

References:

1. Dafner, R., Cohen-Or, D. and Matias, Y., Context-Based Space-Filling Curves, Eurographics 2000, 19:3 (2000) [PDF]

2. Hamilton, C. and Rau-Chaplin, A., Compact Hilbert Indices for Multi-Dimensional Data, Proceedings of the First International Conference on Complex, Intelligent and Software Intensive Systems, April 2007, 139-146 [PDF]

3. Wilson, D., Generating Random Spanning Trees More Quickly than the Cover Time, Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, ACM (1996), 296-303 [PDF]

## Pig puzzle solution

This is a long-delayed follow-up to the previous post about a variant on the dice game Pig.  In this “single turn” version of the game, a player repeatedly rolls a single die, accumulating points according to the numbers rolled, until either:

• Deciding to stop, with a score equal to the total of all rolls; or
• Rolling a 1, resulting in a “pig-out” score of zero, i.e., loss of all previous rolls.

If two players take their turn simultaneously without knowledge of the other player’s outcome, where the highest score wins, what is the optimal strategy for this game?

Before describing the solution, compare this game with the original version analyzed by Neller and Presser (see Reference (1) below), where the players alternate turns repeatedly, and the first player to reach 100 points wins.  This turns out to be significantly more complicated, for two reasons.  First, game states are repeatable– e.g., player 1 rolls a 1, then player 2 also rolls a 1, and we’re right back where we started– so a straightforward, “work backward from the end game” dynamic programming approach won’t work.

The second reason is that a player’s optimal strategy on a given turn cannot always be expressed as simply as “roll until you reach a total of h or greater.”  For example, if you currently have a (banked) score of 41 and your opponent has 49, then optimal strategy is to roll until reaching a turn score of 22 or greater… unless you happen to roll a 6 reaching a score of 27, in which case you should risk rolling just one more time!

Our simultaneous, single-turn version of the game eliminates these complications, while still providing some interesting results.

Solution:

The analysis approach consists of (1) parameterizing all possible strategies, (2) computing the probability distribution of outcomes for a given strategy, and (3) solving the linear programming problem whose solution yields an optimal strategy for the game.

First, we need only consider strategies of the form, “hold at h,” i.e., roll until reaching a total of h or greater (why?).  Given such a strategy, what is the probability distribution of the resulting score?

Johnson (2) describes a recursive algorithm for computing this distribution.  However, I think generating functions can help a lot with the housekeeping here.  Define $r(x)$ to be the generating function for the outcome of a single “successful” roll (i.e., excluding a roll of 1):

$r(x) = \frac{1}{6} x^2 + \frac{1}{6} x^3 + \frac{1}{6} x^4 + \frac{1}{6} x^5 + \frac{1}{6} x^6$

or in Python (using Numpy with VPython):

from visual import *

r = array([0, 0, 1./6, 1./6, 1./6, 1./6, 1./6])


(In all that follows, we could experiment with other variants, such as rolling two dice instead of one, simply by using the corresponding polynomial $r(x)^2$ in place of $r(x)$.)

Then we can compute the total probability of scoring exactly n in any number of rolls as the coefficient of $x^n$ in the generating function $g(x)$ given by

$g(x) = \frac{1}{1 - r(x)} = 1 + r(x) + r(x)^2 + r(x)^3 + ...$

or in Python:

def pig_pgf(r, n_max=100):
"""Given single-roll pgf r(x), return g(x) = 1/(1-r(x)), where
[x^n]g(x) is the total probability of scoring exactly n <= n_max in
any number of rolls without pigging out.
"""
p = array([1])         # p(x) <- 1
p.resize(n_max + 1)
g = array(p)           # g(x) <- 0
g[0] = 0
for n in range(n_max): # collect g(x) = 1 + r(x) + r(x)^2 + ...
g = g + p          # g(x) <- g(x) + p(x)
p = convolve(p, r) # p(x) <- p(x) * r(x)
p = p[:n_max + 1]
return g


(Note that $r(x)$ and $g(x)$ are not, strictly speaking, probability generating functions, since their coefficients do not sum to 1.  That’s okay; we are using them to collect the probability of all possible successful outcomes, after which we can subtract from 1 to find the probability of “pigging out,” or rolling 1 at some point.)

At this point, for a given “hold at h” strategy, we can find the probability of ending a turn with a score of exactly n as the coefficient of $x^n$ in the generating function

$g(x)r_{h,n}(x) = \frac{\sum\limits_{n-h < k \leq n} ([x^k]r(x)) x^k}{1 - r(x)}$

The idea is that $r_{h,n}(x)$ captures the possible values k of the last successful roll, which must have a value greater than $n-h$ (since otherwise we would keep rolling), and at most $n$.  The following Python function computes this distribution as a dictionary mapping scores to probabilities:

def pig_dist(h, r, g):
"""Return distribution p[n] = probability of score n using "hold at
h" strategy, for the game with single-roll and total pgf r(x) and
g(x), resp.
"""
r_max = len(r) - 1
p = {0: 1}
for n in range(h, h + r_max):
p[n] = 0
for k in range(1, r_max + 1):
if n - h < k <= n:
p[n] = p[n] + r[k] * g[n - k]
p[0] = p[0] - p[n]
return p


For example, using the “hold at h=20″ strategy, which corresponds to rolling as long as the expected net increase in score is positive, yields the following distribution of outcomes:

• P(0) = 0.62454083420125672
• P(20) = 0.099712986645624821
• P(21) = 0.094990627487340995
• P(22) = 0.074188848985588224
• P(23) = 0.054196084766465126
• P(24) = 0.035198091574370427
• P(25) = 0.01717252633935375

Armed with this function, if player 1 uses a “hold at $h_1$” strategy, and player 2 uses a “hold at $h_2$” strategy, we can compute the corresponding distributions of scores, and thus the overall distribution of outcomes.  If we roll up that distribution into an expected zero-sum payoff, then the final step is to compute the resulting optimal mixed strategy (which by symmetry should be the same for both players).

The following plot shows the result.  Optimal strategy is to roll to 21 approximately one-third of the time… but with a random smear of lower hold totals as well, including the possibility of just a single roll.

Optimal mixed strategy for single-turn Pig.

It’s interesting just how important the randomness of this mixed strategy is: against any pure “always hold at a fixed h” strategy, an optimal player can gain at least an 8% advantage, and even against the reasonable-sounding “hold at 20″ strategy, an optimal player has a nearly 15% advantage… just by rolling a single time!

References:

1. T. Neller and C. Presser, “Optimal Play of the Dice Game Pig,” The UMAP Journal, 25:1 (2004) 25–47 [PDF]
2. R. Johnson, “A Simple ‘Pig’ Game,” Teaching Statistics, 30:1 (2008) 14-16 [HTML]

## Puzzle: A simple dice game

A programming student was recently interested in simulating and analyzing the dice game Pig.  While working out some of the relevant probabilities and expected values for the original game, I came up with the following twist, that I think is even simpler to describe, but still presents some interesting complexity in game play and analysis:

As in Pig, you and another player each take a turn rolling a single die, repeatedly as many times as you wish, until either:

• You decide to stop, at which point your score is the sum of all of your rolls in the turn; or
• You roll a 1, at which point your turn ends with a score of zero (i.e., you lose all of your rolls in the turn).

Rather than alternating turns with the winner being the first to reach 100 points, in this variant you each take just a single turn, simultaneously and separately, i.e., without knowledge of the other player’s outcome.  The highest score wins.

What is your optimal strategy for playing this game?  And for any given strategy, what is the probability distribution of outcomes (turn totals)?

Posted in Uncategorized | 10 Comments

## When approximate is better than exact

The subject of this post is a computational problem that is solved thousands of times each day (not just as part of my day job, but in almost every GPS receiver in the world).  It is a problem with an interesting history, and dozens of different solutions… but the arguably “best” solution seems to be relatively unknown.

The problem is that of converting Earth-centered Cartesian (x,y,z) coordinates to geodetic latitude, longitude, and height.  The transformation from geodetic to Cartesian coordinates is straightforward, but the reverse transformation is significantly more involved, and as a result has been addressed repeatedly in the literature.  Featherstone (see Reference (4) at the end of this post) provides a nearly comprehensive list of over 70 papers describing solutions or comparisons of solutions, spanning five decades of study.

I think this problem is interesting for three reasons.  First, it still seems to be not universally known that closed-form solutions even exist, usually obtained by finding the roots of an appropriately transformed quartic equation.  Even the first non-Wikipedia entry in a quick Google search claims that “there is no closed-form solution… if the altitude is not zero.”

Second, although all of these various closed-form solutions are “equal,” some are more equal than others.  That is, the accuracies of many of these “exact” solutions suffer when executed on a computer with limited precision.  Or worse, many papers only provide equations, with actual implementations requiring additional details not made explicit, such as handling singularities at the poles, or taking square roots of expressions that “should” be non-negative, but in double precision may yield negative values.  Borkowski’s formula in Reference (1), for example, breaks down for a spherical earth (i.e., zero eccentricity), when the transformation is in fact simplest.  And Bowring’s formula in Reference (2), which I see most often in others’ code, has a usually-neglected but easily-remedied danger of a singularity at the poles.

Which brings us to the third interesting feature of this problem: there is a solution provided by Olson in Reference (6), that, mathematically, is actually an approximation (essentially involving a truncated series expansion), but when implemented in code, and executed with limited double-precision, is not only more accurate than all of the “exact” closed-form solutions that I have evaluated in the references, but is also faster.

(It is worth emphasizing the distinction between this approximation, which although inexact is still essentially “closed-form,” and a whole class of other solutions that involve iterative approximation (e.g., Newton-Raphson), where execution time is more difficult to predict.)

Olson also actually provides working C code as a bonus; following is a port of his implementation to Python:

import math

# WGS-84 ellipsoid parameters
a = 6378137
f = 1 / 298.257223563

# Derived parameters
e2 = f * (2 - f)
a1 = a * e2
a2 = a1 * a1
a3 = a1 * e2 / 2
a4 = 2.5 * a2
a5 = a1 + a3
a6 = 1 - e2

def ecef_to_lla(ecef):
"""Convert ECEF (meters) to LLA (radians and meters).
"""
# Olson, D. K., Converting Earth-Centered, Earth-Fixed Coordinates to
# Geodetic Coordinates, IEEE Transactions on Aerospace and Electronic
# Systems, 32 (1996) 473-476.
w = math.sqrt(ecef[0] * ecef[0] + ecef[1] * ecef[1])
z = ecef[2]
zp = abs(z)
w2 = w * w
r2 = z * z + w2
r  = math.sqrt(r2)
s2 = z * z / r2
c2 = w2 / r2
u = a2 / r
v = a3 - a4 / r
if c2 > 0.3:
s = (zp / r) * (1 + c2 * (a1 + u + s2 * v) / r)
lat = math.asin(s)
ss = s * s
c = math.sqrt(1 - ss)
else:
c = (w / r) * (1 - s2 * (a5 - u - c2 * v) / r)
lat = math.acos(c)
ss = 1 - c * c
s = math.sqrt(ss)
g = 1 - e2 * ss
rg = a / math.sqrt(g)
rf = a6 * rg
u = w - rg * c
v = zp - rf * s
f = c * u + s * v
m = c * v - s * u
p = m / (rf / g + f)
lat = lat + p
if z < 0:
lat = -lat
return (lat, math.atan2(ecef[1], ecef[0]), f + m * p / 2)


References:

1. K. M. Borkowski, “Accurate Algorithms to Transform Geocentric to Geodetic Coordinates,” Bulletin Geodesique, 63 (1989) 50-56 [HTML]
2. B. R. Bowring, “Transformation from Spatial to Geographical Coordinates,” Survey Review, 23:181 (1976) 323-327
3. R. Burtch, “A Comparison of Methods Used in Rectangular to Geodetic Coordinate Transformations,” Proceedings of the American Congress on Surveying and Mapping (ACSM) Annual Conference and Technology Exhibition, Orlando, FL, 2006
4. W. E. Featherstone and S. J. Claessens, “Closed-Form Transformation Between Geodetic and Ellipsoidal Coordinates,” Studia Geophysica et Geodaetica, 52 (2008) 1-18
5. M. Heikkinen, “Geschlossene Formeln zur Berechnung raumlicher geodatischer Koordinaten aus rechtwinkligen Koordinaten,” Zeitschrift Vermess. (in German), 107 (1982) 207-211
6. D. K. Olson, “Converting Earth-Centered, Earth-Fixed Coordinates to Geodetic Coordinates,” IEEE Transactions on Aerospace and Electronic Systems, 32 (1996) 473-476
7. I. Sofair, “Improved Method for Calculating Exact Geodetic Latitude and Altitude Revisited,” Journal of Guidance, Control, and Dynamics, 23:2 (2000) 369
8. J. Zhu, “Conversion of Earth-Centered Earth-Fixed Coordinates to Geodetic Coordinates,” IEEE Transactions on Aerospace and Electronic Systems, 30:3 (1994) 957-962
9. J. Zhu, “Exact Conversion of Earth-Centered, Earth-Fixed Coordinates to Geodetic Coordinates,” Journal of Guidance, Control, and Dynamics, 16:2 (1993) 389-391

## The odds of picking a “perfect” NCAA tournament bracket

This past week Warren Buffett announced that he is backing an offer of \$1 billion to whoever picks a “perfect” bracket for this year’s NCAA basketball tournament.  That is, you must correctly predict the winners of all 63 games in the 64-team, single-elimination tournament.  (Or maybe it’s all 67 games among 68 teams, depending on the specific rules which have not yet been published.)

This news has typically been accompanied in the press by mention of the absurdly small probability of picking such a perfect bracket, usually quoted as “1 in 9,223,372,036,854,775,808,” or 1 in 9.2 quintillion, or $1/2^{63}$.  Intuitively, the idea is that there are 63 games over the course of the tournament, each with 2 possible outcomes.  (I found it interesting that several other web sites seem to have latched onto a different figure of 1 in 4,294,967,296, or 1 in $2^{32}$.  This would correspond to picking just the first round winners.)

To get a feel for how small this probability is, imagine being blindfolded, and asked to blindly solve each of five differently scrambled Rubik’s cubes.  You have a slightly better chance of accidentally solving even one of the five cubes correctly than you have of picking a perfect bracket.

Or do you?  This back-of-the-envelope calculation essentially assumes that either (a) you know nothing about college basketball, and simply pick winners of each game at random; or (b) all teams are equally matched, so that the probability of any game is equal to 1/2.  The latter is certainly not true; for example, over the past 29 years of the current tournament format, among 116 first-round match-ups between 1st and 16th seeds, there has not been a single upset.  More generally, the larger the difference between teams’ seeds, the more lopsided we might expect the outcome to be.

So the question that motivated this post is: can we improve on this calculation of the probability of picking a perfect NCAA tournament bracket?  It seemed like it would be useful to start with some historical data about past tournament games.  This turned out to be harder to find than I thought… at least, in some conveniently machine-readable format.  But after poking around on the web and writing a few quick-and-dirty parsing scripts, following is some data that will help.

First, the following 16×16 matrix with elements $G_{i,j}$ indicates the total number of games played between teams seeded i and j, from 1985 through 2013.  For example, the 116′s along the anti-diagonal correspond to the 29 years’ worth of first round games in each of the four regional brackets (1 vs 16, 2 vs 15, 3 vs 14, etc.):

  16  56  27  52  39  10   4  58  61   4   5  19   4   0   0 116
56   2  47   7   4  29  67   5   1  42  12   1   0   0 116   0
27  47   1   7   3  63   9   2   1  13  37   0   0 116   1   0
52   7   7   1  62   4   2   6   2   2   0  30 116   0   0   0
39   4   3  62   1   1   0   3   2   1   0 116  14   0   0   0
10  29  63   4   1   0   6   1   0   6 116   0   0  14   0   0
4  67   9   2   0   6   0   1   0 116   3   0   0   1   3   0
58   5   2   6   3   1   1   0 116   0   1   1   1   0   0   0
61   1   1   2   2   0   0 116   0   0   0   0   1   0   0   0
4  42  13   2   1   6 116   0   0   0   1   0   0   1   4   0
5  12  37   0   0 116   3   1   0   1   0   0   0   3   0   0
19   1   0  30 116   0   0   1   0   0   0   0  11   0   0   0
4   0   0 116  14   0   0   1   1   0   0  11   0   0   0   0
0   0 116   0   0  14   1   0   0   1   3   0   0   0   0   0
0 116   1   0   0   0   3   0   0   4   0   0   0   0   0   0
116   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0


Next, the following 16×16 matrix with elements $W_{i,j}$ indicates the number of those $G_{i,j}$ games won by the team seeded i.  (Note: the diagonal requires some careful handling, and is left as an exercise for the reader.)

  16  31  15  36  32   7   4  47  56   4   2  19   4   0   0 116
25   2  29   3   0  22  50   2   0  25  11   1   0   0 109   0
12  18   1   4   2  36   6   2   1   9  25   0   0  99   1   0
16   4   3   1  34   2   2   2   2   2   0  18  91   0   0   0
7   4   1  28   1   1   0   1   1   1   0  75  11   0   0   0
3   7  27   2   0   0   3   0   0   4  77   0   0  12   0   0
0  17   3   0   0   3   0   0   0  70   0   0   0   1   2   0
11   3   0   4   2   1   1   0  56   0   1   0   1   0   0   0
5   1   0   0   1   0   0  60   0   0   0   0   1   0   0   0
0  17   4   0   0   2  46   0   0   0   0   0   0   1   4   0
3   1  12   0   0  39   3   0   0   1   0   0   0   3   0   0
0   0   0  12  41   0   0   1   0   0   0   0   8   0   0   0
0   0   0  25   3   0   0   0   0   0   0   3   0   0   0   0
0   0  17   0   0   2   0   0   0   0   0   0   0   0   0   0
0   7   0   0   0   0   1   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0


Using this data, it would be nice to be able to estimate the probability $P_{i,j}$ that a team seeded i will beat a team seeded j, as simply $W_{i,j}/G_{i,j}$… at least for those match-ups for which we have data.  If we knew all of these probabilities, we could compute the desired overall probability that any given bracket correctly predicts all 63 games.

Unfortunately, the result appears to be a bit of a mess, if we plot these estimated probabilities as a function of the difference in seeds:

Probability of higher (i.e., favored) seed i beating seed j, vs. the difference in seeds j-i. Historical data shown in blue, simple linear model shown in red.

At first glance, the most “extreme” upsets shown here seem to be those involving zero wins by the favored team.  For example, a #2 seed lost to a #9 seed in the only such match-up (indicated by the point (7,0) in the figure), and a #2 seed has never beat a #5 seed in any of four separate games (the point (3,0) in the figure).

But these are pretty small samples, though… and so the 95% confidence intervals surrounding these point estimates are extremely wide.  Wide enough, in fact, that they contain the simple linear model (shown by the red line in the figure, which I will describe shortly) for all but three of the 68 past match-ups:

• #1 beat #5 in 32 of 39 games.  The linear model predicts P(1 beats 5)=0.629, which is less than the 95% confidence interval (0.665, 0.925).
• #1 beat #9 in 56 of 61 games.  The linear model predicts P(1 beats 9)=0.758, which is less than the 95% confidence interval (0.819, 0.973).
• #2 beat #10 in only 25 of 42 games.  The linear model predicts P(2 beats 10)=0.758, which is greater than the 95% confidence interval (0.433, 0.744).

If we really think that seed difference can be a useful estimator, then we can improve the picture somewhat by actually grouping all match-ups with the same seed difference (e.g., collect games between #1 and #5 with those between #2 and #6, #3 and #7, etc.), and plot the resulting point estimates and narrower confidence intervals:

Probability of favored team winning vs. the difference in seeds. Historical data aggregated over all match-ups with the given seed difference, with 95% confidence intervals.

At any rate, there is arguably still room for refinement, but this simple linear model seems like a reasonable starting point.  If we estimate the probability that seed i beats seed j as

$P_{i,j} = \frac{1}{2} + \frac{j-i}{31}$

as shown by the red line in the above figure, then using this model, we can compute the overall probability of, say, a bracket with no upsets (i.e., always picking the higher-seeded team to win).  The resulting chance of winning with this bracket is approximately 1 in 241 billion, much better odds than “1 in 9 quintillion”… but still a long shot to say the least.  And more importantly, any other bracket, with any upsets, has an even smaller chance of winning!

To put it in a perspective similar to that given in the video shown here (which seems to be a quick correction of the earlier “1 in 9.2 quintillion” article linked above), suppose that we could get all 314 million people in the United States to:

1. Submit a bracket,
2. Coordinate their entries so that each of the 314 million brackets are distinct, and
3. Select those 314 million distinct brackets that are the most likely to occur; i.e., with the fewest carefully selected “upsets”, involving the most reasonable match-ups (e.g., #8 losing to #9, as opposed to #1 losing to #16).

Even with this massively and optimally coordinated effort, the probability that anyone in the country has the perfect bracket– let alone whether that person is you– is only about 0.001, or about one-tenth of one percent.

My wager: Buffett’s extra billion is safe.

Posted in Uncategorized | 1 Comment

## Computing positions and eclipses of the Sun and Moon

Looking ahead to this new year, some of us here in the United States will have several opportunities to see the Earth and Moon get in each other’s way, so to speak.  There will be a lunar eclipse on 14-15 April 2014, and another in October.  There will also be a partial solar eclipse on 23 October 2014… and a total solar eclipse three years from now, on 21 August 2017.

For example, the following simple animation shows what the 2017 eclipse will look like from my dad’s house, which is slightly south of the path of totality.

Partial solar eclipse on 21 August 2017 as seen from my dad’s house. Times are UTC.

Predicting eclipses requires knowing where the Sun and Moon will be in relation to the Earth at any given time.  (Actually, I guess it really only requires knowing which web sites have prediction/animation tools.  NASA has a good site for predicting eclipses at arbitrary latitude/longitude locations, but without corresponding visualizations.  And timeanddate.com has some nice animated visualizations of eclipses, but only for canned locations.)

Computing the location of the Sun and/or Moon presents some interesting challenges.  In particular, the complexity of the solution depends a lot on how much accuracy is required.  The references at the end of this post provide equations with accuracies that are suitable for the kinds of applications I am interested in… the trick is simply stitching together those equations, coordinate frames, etc.

The result is a couple of functions that can be handy in many different exercises, from recreational to educational to work-related: predicting eclipses, predicting the visible illumination of satellite passes (e.g., the International Space Station), evaluating the accuracy of trying to use an analog watch as a compass, and extending the accuracy of ballistically propagated satellite orbits.

First, following is Python code for computing the position of the Sun, in ECEF coordinates, with an accuracy of “approximately 0.01 degree.”  Time is specified in seconds since 1970-01-01T00:00:00Z.

import math

def sun_position(t):
"""Return ECEF position (m) of the sun at time t (sec. since 1970).
"""
# Reference: Meeus, Jean, Astronomical Algorithms (2nd Ed.). Richmond:
# Willmann-Bell, Inc., 2009, Ch. 25.
d = t / 86400.0 - 10957.5
T = d / 36525
L = math.radians(280.46646 + T * (36000.76983 + 0.0003032 * T))
M = math.radians(357.52911 + T * (35999.05029 - 0.0001537 * T))
e = 0.016708634 - T * (0.000042037 + 0.0000001267 * T)
C = math.radians((1.914602 - T * (0.004817 + 0.000014 * T)) * math.sin(M) +
(0.019993 - 0.000101 * T) * math.sin(2 * M) +
0.000289 * math.sin(3 * M))
s = L + C
v = M + C
R = 149597870000 * (1.000001018 * (1 - e * e) / (1 + e * math.cos(v)))
omega = math.radians(125.04 - 1934.136 * T)
lon = math.radians(math.degrees(s) - 0.00569 - 0.00478 * math.sin(omega))
epsilon = math.radians(23.0 + (26.0 + (21.448 - T * (46.8150 + T *
(0.00059 - 0.001813 * T))) / 60) / 60 +
0.00256 * math.cos(omega))
x = R * math.cos(lon)
y = R * math.sin(lon) * math.cos(epsilon)
z = R * math.sin(lon) * math.sin(epsilon)
theta = math.radians(280.46061837 + 360.98564736629 * d + T * T *
(0.000387933 - T / 38710000))
s = math.sin(theta)
c = math.cos(theta)
return (x * c + y * s, -x * s + y * c, z)


Next is the corresponding function for the Moon, with an accuracy of “several arcminutes and about 500 kilometers in the lunar distance.”

import math

def moon_position(t):
"""Return ECEF position (m) of the moon at time t (sec. since 1970).
"""
# Reference: Montenbruck and Gill, Satellite Orbits. Berlin: Springer,
# 2005, Ch. 3.3.2.
d2000 = t / 86400.0 - 10957.5
T = d2000 / 36525
L0 = math.radians(218.31617 + 481267.88088 * T - 1.3972 * T)
l = math.radians(134.96292 + 477198.86753 * T)
lp = math.radians(357.52543 + 35999.04944 * T)
F = math.radians(93.27283 + 483202.01873 * T)
D = math.radians(297.85027 + 445267.11135 * T)

# Longitude with respect to the equinox and ecliptic of the year 2000.
(22640 * math.sin(l) + 769 * math.sin(2 * l)
- 4586 * math.sin(l - 2 * D)
+ 2370 * math.sin(2 * D) - 668 * math.sin(lp) - 412 * math.sin(2 * F)
- 212 * math.sin(2 * l - 2 * D) - 206 * math.sin(l + lp - 2 * D)
+ 192 * math.sin(l + 2 * D) - 165 * math.sin(lp - 2 * D)
+ 148 * math.sin(l - lp) - 125 * math.sin(D)
- 110 * math.sin(l + lp) - 55 * math.sin(2 * F - 2 * D)) / 3600)

# Lunar latitude.
(18520 * math.sin(F + lon - L0 + math.radians(
(412 * math.sin(2 * F) + 541 * math.sin(lp)) / 3600))
- 526 * math.sin(F - 2 * D) + 44 * math.sin(l + F - 2 * D)
- 31 * math.sin(-l + F - 2 * D) - 25 * math.sin(-2 * l + F)
- 23 * math.sin(lp + F - 2 * D) + 21 * math.sin(-l + F)
+ 11 * math.sin(-lp + F - 2 * D)) / 3600)

# Distance from the center of the Earth.
r = (385000 - 20905 * math.cos(l) - 3699 * math.cos(2 * D - l)
- 2956 * math.cos(2 * D) - 570 * math.cos(2 * l)
+ 246 * math.cos(2 * l - 2 * D)
- 205 * math.cos(lp - 2 * D) - 171 * math.cos(l + 2 * D)
- 152 * math.cos(l + lp - 2 * D)) * 1000

# Convert ecliptic to equatorial Cartesian coordinates.
lon = lon + math.radians(1.3972 * T)
x = r * math.cos(lon) * math.cos(beta)
y = r * math.sin(lon) * math.cos(beta)
z = r * math.sin(beta)
s = math.sin(-epsilon)
c = math.cos(-epsilon)
y, z = (y * c + z * s,
-y * s + z * c)

# Convert to ECEF.
theta = math.radians(280.46061837 + 360.98564736629 * d2000)
s = math.sin(theta)
c = math.cos(theta)
return (x * c + y * s, -x * s + y * c, z)


References:

1. Meeus, Jean, Astronomical Algorithms (2nd Ed.). Richmond: Willmann-Bell, Inc., 2009, Ch. 25.
2. Montenbruck and Gill, Satellite Orbits. Berlin: Springer, 2005, Ch. 3.3.2.

## Another card puzzle

I seem to have accumulated a backlog of subject matter that I have found interesting recently.  I’ll get back to it soon… but right now, in the spirit of not getting back to work just yet, following is another puzzle involving a card game.  I found this problem in Peter Winkler’s Mathematical Puzzles: A Connoisseur’s Collection.  This is an excellent book, with more than just the “same old” problems, but also including some interesting twists that were new to me.

For example, recall the card puzzle motivating the discussion of other players’ strategy (not) affecting your expected return in blackjack.  Winkler discusses this problem, but only as a prelude to the following more challenging variant:

Problem: Consider a standard, shuffled poker deck of 52 playing cards, of which 26 are red and 26 are black.  You start with one dollar.  For each of 52 turns, I will deal one card at a time face up on the table between us.  At each turn, you may bet any fraction of your current bankroll (from “passing” to betting everything you have) on the color of the next card to be dealt, paying even odds if your guess is correct.  For example, a conservative strategy would be to wait until the last card, whose color you will know with certainty, and bet your whole dollar, guaranteeing a net return of one dollar.

Can you do better than this?  Like the earlier problem, it is interesting to consider maximizing both the expected (i.e., average) return, as well as the worst case return.  And also like the earlier problem, this is a great combination of “write a short program to compute optimal strategy” and “use pencil and paper to show why the program’s output makes sense.”

Posted in Uncategorized | 3 Comments