## Puzzle: A simple dice game

A programming student was recently interested in simulating and analyzing the dice game Pig.  While working out some of the relevant probabilities and expected values for the original game, I came up with the following twist, that I think is even simpler to describe, but still presents some interesting complexity in game play and analysis:

As in Pig, you and another player each take a turn rolling a single die, repeatedly as many times as you wish, until either:

• You decide to stop, at which point your score is the sum of all of your rolls in the turn; or
• You roll a 1, at which point your turn ends with a score of zero (i.e., you lose all of your rolls in the turn).

Rather than alternating turns with the winner being the first to reach 100 points, in this variant you each take just a single turn, simultaneously and separately, i.e., without knowledge of the other player’s outcome.  The highest score wins.

What is your optimal strategy for playing this game?  And for any given strategy, what is the probability distribution of outcomes (turn totals)?

Posted in Uncategorized | 9 Comments

## When approximate is better than exact

The subject of this post is a computational problem that is solved thousands of times each day (not just as part of my day job, but in almost every GPS receiver in the world).  It is a problem with an interesting history, and dozens of different solutions… but the arguably “best” solution seems to be relatively unknown.

The problem is that of converting Earth-centered Cartesian (x,y,z) coordinates to geodetic latitude, longitude, and height.  The transformation from geodetic to Cartesian coordinates is straightforward, but the reverse transformation is significantly more involved, and as a result has been addressed repeatedly in the literature.  Featherstone (see Reference (4) at the end of this post) provides a nearly comprehensive list of over 70 papers describing solutions or comparisons of solutions, spanning five decades of study.

I think this problem is interesting for three reasons.  First, it still seems to be not universally known that closed-form solutions even exist, usually obtained by finding the roots of an appropriately transformed quartic equation.  Even the first non-Wikipedia entry in a quick Google search claims that “there is no closed-form solution… if the altitude is not zero.”

Second, although all of these various closed-form solutions are “equal,” some are more equal than others.  That is, the accuracies of many of these “exact” solutions suffer when executed on a computer with limited precision.  Or worse, many papers only provide equations, with actual implementations requiring additional details not made explicit, such as handling singularities at the poles, or taking square roots of expressions that “should” be non-negative, but in double precision may yield negative values.  Borkowski’s formula in Reference (1), for example, breaks down for a spherical earth (i.e., zero eccentricity), when the transformation is in fact simplest.  And Bowring’s formula in Reference (2), which I see most often in others’ code, has a usually-neglected but easily-remedied danger of a singularity at the poles.

Which brings us to the third interesting feature of this problem: there is a solution provided by Olson in Reference (6), that, mathematically, is actually an approximation (essentially involving a truncated series expansion), but when implemented in code, and executed with limited double-precision, is not only more accurate than all of the “exact” closed-form solutions that I have evaluated in the references, but is also faster.

(It is worth emphasizing the distinction between this approximation, which although inexact is still essentially “closed-form,” and a whole class of other solutions that involve iterative approximation (e.g., Newton-Raphson), where execution time is more difficult to predict.)

Olson also actually provides working C code as a bonus; following is a port of his implementation to Python:

import math

# WGS-84 ellipsoid parameters
a = 6378137
f = 1 / 298.257223563

# Derived parameters
e2 = f * (2 - f)
a1 = a * e2
a2 = a1 * a1
a3 = a1 * e2 / 2
a4 = 2.5 * a2
a5 = a1 + a3
a6 = 1 - e2

def ecef_to_lla(ecef):
"""Convert ECEF (meters) to LLA (radians and meters).
"""
# Olson, D. K., Converting Earth-Centered, Earth-Fixed Coordinates to
# Geodetic Coordinates, IEEE Transactions on Aerospace and Electronic
# Systems, 32 (1996) 473-476.
w = math.sqrt(ecef[0] * ecef[0] + ecef[1] * ecef[1])
z = ecef[2]
zp = abs(z)
w2 = w * w
r2 = z * z + w2
r  = math.sqrt(r2)
s2 = z * z / r2
c2 = w2 / r2
u = a2 / r
v = a3 - a4 / r
if c2 > 0.3:
s = (zp / r) * (1 + c2 * (a1 + u + s2 * v) / r)
lat = math.asin(s)
ss = s * s
c = math.sqrt(1 - ss)
else:
c = (w / r) * (1 - s2 * (a5 - u - c2 * v) / r)
lat = math.acos(c)
ss = 1 - c * c
s = math.sqrt(ss)
g = 1 - e2 * ss
rg = a / math.sqrt(g)
rf = a6 * rg
u = w - rg * c
v = zp - rf * s
f = c * u + s * v
m = c * v - s * u
p = m / (rf / g + f)
lat = lat + p
if z < 0:
lat = -lat
return (lat, math.atan2(ecef[1], ecef[0]), f + m * p / 2)


References:

1. K. M. Borkowski, “Accurate Algorithms to Transform Geocentric to Geodetic Coordinates,” Bulletin Geodesique, 63 (1989) 50-56 [HTML]
2. B. R. Bowring, “Transformation from Spatial to Geographical Coordinates,” Survey Review, 23:181 (1976) 323-327
3. R. Burtch, “A Comparison of Methods Used in Rectangular to Geodetic Coordinate Transformations,” Proceedings of the American Congress on Surveying and Mapping (ACSM) Annual Conference and Technology Exhibition, Orlando, FL, 2006
4. W. E. Featherstone and S. J. Claessens, “Closed-Form Transformation Between Geodetic and Ellipsoidal Coordinates,” Studia Geophysica et Geodaetica, 52 (2008) 1-18
5. M. Heikkinen, “Geschlossene Formeln zur Berechnung raumlicher geodatischer Koordinaten aus rechtwinkligen Koordinaten,” Zeitschrift Vermess. (in German), 107 (1982) 207-211
6. D. K. Olson, “Converting Earth-Centered, Earth-Fixed Coordinates to Geodetic Coordinates,” IEEE Transactions on Aerospace and Electronic Systems, 32 (1996) 473-476
7. I. Sofair, “Improved Method for Calculating Exact Geodetic Latitude and Altitude Revisited,” Journal of Guidance, Control, and Dynamics, 23:2 (2000) 369
8. J. Zhu, “Conversion of Earth-Centered Earth-Fixed Coordinates to Geodetic Coordinates,” IEEE Transactions on Aerospace and Electronic Systems, 30:3 (1994) 957-962
9. J. Zhu, “Exact Conversion of Earth-Centered, Earth-Fixed Coordinates to Geodetic Coordinates,” Journal of Guidance, Control, and Dynamics, 16:2 (1993) 389-391

## The odds of picking a “perfect” NCAA tournament bracket

This past week Warren Buffett announced that he is backing an offer of \$1 billion to whoever picks a “perfect” bracket for this year’s NCAA basketball tournament.  That is, you must correctly predict the winners of all 63 games in the 64-team, single-elimination tournament.  (Or maybe it’s all 67 games among 68 teams, depending on the specific rules which have not yet been published.)

This news has typically been accompanied in the press by mention of the absurdly small probability of picking such a perfect bracket, usually quoted as “1 in 9,223,372,036,854,775,808,” or 1 in 9.2 quintillion, or $1/2^{63}$.  Intuitively, the idea is that there are 63 games over the course of the tournament, each with 2 possible outcomes.  (I found it interesting that several other web sites seem to have latched onto a different figure of 1 in 4,294,967,296, or 1 in $2^{32}$.  This would correspond to picking just the first round winners.)

To get a feel for how small this probability is, imagine being blindfolded, and asked to blindly solve each of five differently scrambled Rubik’s cubes.  You have a slightly better chance of accidentally solving even one of the five cubes correctly than you have of picking a perfect bracket.

Or do you?  This back-of-the-envelope calculation essentially assumes that either (a) you know nothing about college basketball, and simply pick winners of each game at random; or (b) all teams are equally matched, so that the probability of any game is equal to 1/2.  The latter is certainly not true; for example, over the past 29 years of the current tournament format, among 116 first-round match-ups between 1st and 16th seeds, there has not been a single upset.  More generally, the larger the difference between teams’ seeds, the more lopsided we might expect the outcome to be.

So the question that motivated this post is: can we improve on this calculation of the probability of picking a perfect NCAA tournament bracket?  It seemed like it would be useful to start with some historical data about past tournament games.  This turned out to be harder to find than I thought… at least, in some conveniently machine-readable format.  But after poking around on the web and writing a few quick-and-dirty parsing scripts, following is some data that will help.

First, the following 16×16 matrix with elements $G_{i,j}$ indicates the total number of games played between teams seeded i and j, from 1985 through 2013.  For example, the 116′s along the anti-diagonal correspond to the 29 years’ worth of first round games in each of the four regional brackets (1 vs 16, 2 vs 15, 3 vs 14, etc.):

  16  56  27  52  39  10   4  58  61   4   5  19   4   0   0 116
56   2  47   7   4  29  67   5   1  42  12   1   0   0 116   0
27  47   1   7   3  63   9   2   1  13  37   0   0 116   1   0
52   7   7   1  62   4   2   6   2   2   0  30 116   0   0   0
39   4   3  62   1   1   0   3   2   1   0 116  14   0   0   0
10  29  63   4   1   0   6   1   0   6 116   0   0  14   0   0
4  67   9   2   0   6   0   1   0 116   3   0   0   1   3   0
58   5   2   6   3   1   1   0 116   0   1   1   1   0   0   0
61   1   1   2   2   0   0 116   0   0   0   0   1   0   0   0
4  42  13   2   1   6 116   0   0   0   1   0   0   1   4   0
5  12  37   0   0 116   3   1   0   1   0   0   0   3   0   0
19   1   0  30 116   0   0   1   0   0   0   0  11   0   0   0
4   0   0 116  14   0   0   1   1   0   0  11   0   0   0   0
0   0 116   0   0  14   1   0   0   1   3   0   0   0   0   0
0 116   1   0   0   0   3   0   0   4   0   0   0   0   0   0
116   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0


Next, the following 16×16 matrix with elements $W_{i,j}$ indicates the number of those $G_{i,j}$ games won by the team seeded i.  (Note: the diagonal requires some careful handling, and is left as an exercise for the reader.)

  16  31  15  36  32   7   4  47  56   4   2  19   4   0   0 116
25   2  29   3   0  22  50   2   0  25  11   1   0   0 109   0
12  18   1   4   2  36   6   2   1   9  25   0   0  99   1   0
16   4   3   1  34   2   2   2   2   2   0  18  91   0   0   0
7   4   1  28   1   1   0   1   1   1   0  75  11   0   0   0
3   7  27   2   0   0   3   0   0   4  77   0   0  12   0   0
0  17   3   0   0   3   0   0   0  70   0   0   0   1   2   0
11   3   0   4   2   1   1   0  56   0   1   0   1   0   0   0
5   1   0   0   1   0   0  60   0   0   0   0   1   0   0   0
0  17   4   0   0   2  46   0   0   0   0   0   0   1   4   0
3   1  12   0   0  39   3   0   0   1   0   0   0   3   0   0
0   0   0  12  41   0   0   1   0   0   0   0   8   0   0   0
0   0   0  25   3   0   0   0   0   0   0   3   0   0   0   0
0   0  17   0   0   2   0   0   0   0   0   0   0   0   0   0
0   7   0   0   0   0   1   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0


Using this data, it would be nice to be able to estimate the probability $P_{i,j}$ that a team seeded i will beat a team seeded j, as simply $W_{i,j}/G_{i,j}$… at least for those match-ups for which we have data.  If we knew all of these probabilities, we could compute the desired overall probability that any given bracket correctly predicts all 63 games.

Unfortunately, the result appears to be a bit of a mess, if we plot these estimated probabilities as a function of the difference in seeds:

Probability of higher (i.e., favored) seed i beating seed j, vs. the difference in seeds j-i. Historical data shown in blue, simple linear model shown in red.

At first glance, the most “extreme” upsets shown here seem to be those involving zero wins by the favored team.  For example, a #2 seed lost to a #9 seed in the only such match-up (indicated by the point (7,0) in the figure), and a #2 seed has never beat a #5 seed in any of four separate games (the point (3,0) in the figure).

But these are pretty small samples, though… and so the 95% confidence intervals surrounding these point estimates are extremely wide.  Wide enough, in fact, that they contain the simple linear model (shown by the red line in the figure, which I will describe shortly) for all but three of the 68 past match-ups:

• #1 beat #5 in 32 of 39 games.  The linear model predicts P(1 beats 5)=0.629, which is less than the 95% confidence interval (0.665, 0.925).
• #1 beat #9 in 56 of 61 games.  The linear model predicts P(1 beats 9)=0.758, which is less than the 95% confidence interval (0.819, 0.973).
• #2 beat #10 in only 25 of 42 games.  The linear model predicts P(2 beats 10)=0.758, which is greater than the 95% confidence interval (0.433, 0.744).

If we really think that seed difference can be a useful estimator, then we can improve the picture somewhat by actually grouping all match-ups with the same seed difference (e.g., collect games between #1 and #5 with those between #2 and #6, #3 and #7, etc.), and plot the resulting point estimates and narrower confidence intervals:

Probability of favored team winning vs. the difference in seeds. Historical data aggregated over all match-ups with the given seed difference, with 95% confidence intervals.

At any rate, there is arguably still room for refinement, but this simple linear model seems like a reasonable starting point.  If we estimate the probability that seed i beats seed j as

$P_{i,j} = \frac{1}{2} + \frac{j-i}{31}$

as shown by the red line in the above figure, then using this model, we can compute the overall probability of, say, a bracket with no upsets (i.e., always picking the higher-seeded team to win).  The resulting chance of winning with this bracket is approximately 1 in 241 billion, much better odds than “1 in 9 quintillion”… but still a long shot to say the least.  And more importantly, any other bracket, with any upsets, has an even smaller chance of winning!

To put it in a perspective similar to that given in the video shown here (which seems to be a quick correction of the earlier “1 in 9.2 quintillion” article linked above), suppose that we could get all 314 million people in the United States to:

1. Submit a bracket,
2. Coordinate their entries so that each of the 314 million brackets are distinct, and
3. Select those 314 million distinct brackets that are the most likely to occur; i.e., with the fewest carefully selected “upsets”, involving the most reasonable match-ups (e.g., #8 losing to #9, as opposed to #1 losing to #16).

Even with this massively and optimally coordinated effort, the probability that anyone in the country has the perfect bracket– let alone whether that person is you– is only about 0.001, or about one-tenth of one percent.

My wager: Buffett’s extra billion is safe.

Posted in Uncategorized | 1 Comment

## Computing positions and eclipses of the Sun and Moon

Looking ahead to this new year, some of us here in the United States will have several opportunities to see the Earth and Moon get in each other’s way, so to speak.  There will be a lunar eclipse on 14-15 April 2014, and another in October.  There will also be a partial solar eclipse on 23 October 2014… and a total solar eclipse three years from now, on 21 August 2017.

For example, the following simple animation shows what the 2017 eclipse will look like from my dad’s house, which is slightly south of the path of totality.

Partial solar eclipse on 21 August 2017 as seen from my dad’s house. Times are UTC.

Predicting eclipses requires knowing where the Sun and Moon will be in relation to the Earth at any given time.  (Actually, I guess it really only requires knowing which web sites have prediction/animation tools.  NASA has a good site for predicting eclipses at arbitrary latitude/longitude locations, but without corresponding visualizations.  And timeanddate.com has some nice animated visualizations of eclipses, but only for canned locations.)

Computing the location of the Sun and/or Moon presents some interesting challenges.  In particular, the complexity of the solution depends a lot on how much accuracy is required.  The references at the end of this post provide equations with accuracies that are suitable for the kinds of applications I am interested in… the trick is simply stitching together those equations, coordinate frames, etc.

The result is a couple of functions that can be handy in many different exercises, from recreational to educational to work-related: predicting eclipses, predicting the visible illumination of satellite passes (e.g., the International Space Station), evaluating the accuracy of trying to use an analog watch as a compass, and extending the accuracy of ballistically propagated satellite orbits.

First, following is Python code for computing the position of the Sun, in ECEF coordinates, with an accuracy of “approximately 0.01 degree.”  Time is specified in seconds since 1970-01-01T00:00:00Z.

import math

def sun_position(t):
"""Return ECEF position (m) of the sun at time t (sec. since 1970).
"""
# Reference: Meeus, Jean, Astronomical Algorithms (2nd Ed.). Richmond:
# Willmann-Bell, Inc., 2009, Ch. 25.
d = t / 86400.0 - 10957.5
T = d / 36525
L = math.radians(280.46646 + T * (36000.76983 + 0.0003032 * T))
M = math.radians(357.52911 + T * (35999.05029 - 0.0001537 * T))
e = 0.016708634 - T * (0.000042037 + 0.0000001267 * T)
C = math.radians((1.914602 - T * (0.004817 + 0.000014 * T)) * math.sin(M) +
(0.019993 - 0.000101 * T) * math.sin(2 * M) +
0.000289 * math.sin(3 * M))
s = L + C
v = M + C
R = 149597870000 * (1.000001018 * (1 - e * e) / (1 + e * math.cos(v)))
omega = math.radians(125.04 - 1934.136 * T)
lon = math.radians(math.degrees(s) - 0.00569 - 0.00478 * math.sin(omega))
epsilon = math.radians(23.0 + (26.0 + (21.448 - T * (46.8150 + T *
(0.00059 - 0.001813 * T))) / 60) / 60 +
0.00256 * math.cos(omega))
x = R * math.cos(lon)
y = R * math.sin(lon) * math.cos(epsilon)
z = R * math.sin(lon) * math.sin(epsilon)
theta = math.radians(280.46061837 + 360.98564736629 * d + T * T *
(0.000387933 - T / 38710000))
s = math.sin(theta)
c = math.cos(theta)
return (x * c + y * s, -x * s + y * c, z)


Next is the corresponding function for the Moon, with an accuracy of “several arcminutes and about 500 kilometers in the lunar distance.”

import math

def moon_position(t):
"""Return ECEF position (m) of the moon at time t (sec. since 1970).
"""
# Reference: Montenbruck and Gill, Satellite Orbits. Berlin: Springer,
# 2005, Ch. 3.3.2.
d2000 = t / 86400.0 - 10957.5
T = d2000 / 36525
L0 = math.radians(218.31617 + 481267.88088 * T - 1.3972 * T)
l = math.radians(134.96292 + 477198.86753 * T)
lp = math.radians(357.52543 + 35999.04944 * T)
F = math.radians(93.27283 + 483202.01873 * T)
D = math.radians(297.85027 + 445267.11135 * T)

# Longitude with respect to the equinox and ecliptic of the year 2000.
(22640 * math.sin(l) + 769 * math.sin(2 * l)
- 4586 * math.sin(l - 2 * D)
+ 2370 * math.sin(2 * D) - 668 * math.sin(lp) - 412 * math.sin(2 * F)
- 212 * math.sin(2 * l - 2 * D) - 206 * math.sin(l + lp - 2 * D)
+ 192 * math.sin(l + 2 * D) - 165 * math.sin(lp - 2 * D)
+ 148 * math.sin(l - lp) - 125 * math.sin(D)
- 110 * math.sin(l + lp) - 55 * math.sin(2 * F - 2 * D)) / 3600)

# Lunar latitude.
(18520 * math.sin(F + lon - L0 + math.radians(
(412 * math.sin(2 * F) + 541 * math.sin(lp)) / 3600))
- 526 * math.sin(F - 2 * D) + 44 * math.sin(l + F - 2 * D)
- 31 * math.sin(-l + F - 2 * D) - 25 * math.sin(-2 * l + F)
- 23 * math.sin(lp + F - 2 * D) + 21 * math.sin(-l + F)
+ 11 * math.sin(-lp + F - 2 * D)) / 3600)

# Distance from the center of the Earth.
r = (385000 - 20905 * math.cos(l) - 3699 * math.cos(2 * D - l)
- 2956 * math.cos(2 * D) - 570 * math.cos(2 * l)
+ 246 * math.cos(2 * l - 2 * D)
- 205 * math.cos(lp - 2 * D) - 171 * math.cos(l + 2 * D)
- 152 * math.cos(l + lp - 2 * D)) * 1000

# Convert ecliptic to equatorial Cartesian coordinates.
lon = lon + math.radians(1.3972 * T)
x = r * math.cos(lon) * math.cos(beta)
y = r * math.sin(lon) * math.cos(beta)
z = r * math.sin(beta)
s = math.sin(-epsilon)
c = math.cos(-epsilon)
y, z = (y * c + z * s,
-y * s + z * c)

# Convert to ECEF.
theta = math.radians(280.46061837 + 360.98564736629 * d2000)
s = math.sin(theta)
c = math.cos(theta)
return (x * c + y * s, -x * s + y * c, z)


References:

1. Meeus, Jean, Astronomical Algorithms (2nd Ed.). Richmond: Willmann-Bell, Inc., 2009, Ch. 25.
2. Montenbruck and Gill, Satellite Orbits. Berlin: Springer, 2005, Ch. 3.3.2.

## Another card puzzle

I seem to have accumulated a backlog of subject matter that I have found interesting recently.  I’ll get back to it soon… but right now, in the spirit of not getting back to work just yet, following is another puzzle involving a card game.  I found this problem in Peter Winkler’s Mathematical Puzzles: A Connoisseur’s Collection.  This is an excellent book, with more than just the “same old” problems, but also including some interesting twists that were new to me.

For example, recall the card puzzle motivating the discussion of other players’ strategy (not) affecting your expected return in blackjack.  Winkler discusses this problem, but only as a prelude to the following more challenging variant:

Problem: Consider a standard, shuffled poker deck of 52 playing cards, of which 26 are red and 26 are black.  You start with one dollar.  For each of 52 turns, I will deal one card at a time face up on the table between us.  At each turn, you may bet any fraction of your current bankroll (from “passing” to betting everything you have) on the color of the next card to be dealt, paying even odds if your guess is correct.  For example, a conservative strategy would be to wait until the last card, whose color you will know with certainty, and bet your whole dollar, guaranteeing a net return of one dollar.

Can you do better than this?  Like the earlier problem, it is interesting to consider maximizing both the expected (i.e., average) return, as well as the worst case return.  And also like the earlier problem, this is a great combination of “write a short program to compute optimal strategy” and “use pencil and paper to show why the program’s output makes sense.”

Posted in Uncategorized | 3 Comments

## An optimal stopping card game

Let’s play the following simple game: I will shuffle a standard 52-card deck, and deal one card at a time face up onto the table between us.  You can say “Stop” at any time, at which point I will pay you an amount equal to the fraction of cards dealt so far that are red.  For example, if you stop after seeing a single card, you win 1 if it is red, 0 if it is black, with expected return 1/2.  Of course, you can wait until the entire deck is dealt, also with a guaranteed return of 1/2.

The problem is: can you do better than this?  What is the optimal strategy for playing this game, and what is the corresponding expected return?

(This game is a variant of the following interesting game that I read about this week, which at first glance might seem even simpler, but for which optimal strategy remains an open problem: instead of dealing cards from a deck, I flip a coin repeatedly, and upon stopping I pay you an amount equal to the fraction of tosses that came up heads.)

Posted in Uncategorized | 10 Comments

## Floating-point equality: It’s worse than you think

While recently helping to fix some buggy simulation software, I encountered a nasty problem involving comparison of floating-point values.  I was vaguely aware of the potential for this sort of problem to occur, but this was the first time that I had actually seen it in the wild.

The program was an event-driven simulation, with a periodic “update” event that was supposed to occur, well, periodically (5 times per simulated second).  But it would consistently hang after a couple of minutes, getting stuck in an infinite loop because the periodic update event stopped triggering.  Here is what the code looked like, with the actual time values that caused the problem:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
update();
} else {
// We shouldn't be here.
}


The if-condition was expected to evaluate to true, indicating that enough time had passed since the previous update.  But the condition failed, thus preventing the update and associated time advance.  See if you can identify the cause of the problem.

At this point, the usual from-the-hip diagnosis is, “Floating-point arithmetic is only an approximation; you should never test values for equality, only for differences within some tolerance.  You should probably read What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg.”

This is mostly good advice… but when reading online discussions about this issue, I often get the impression that part of Goldberg’s story is missed.  That is, it is certainly important to recognize that floating-point values have limited precision, so that arithmetic and comparisons that we might expect to be valid for real numbers on a cocktail napkin are not valid when evaluated using limited floating-point precision.  For example, evaluating the expression $1.1+2.2==3.3$ typically yields false, since none of the three values can be represented exactly as a rational number with a denominator that is a power of 2.  So instead of the equality you want, the computer is instead thinking:

$\frac{2476979795053773}{2251799813685248} + \frac{2476979795053773}{1125899906842624} \neq \frac{3715469692580659}{1125899906842624}$

But in the case of the simulation bug described above, this wasn’t the problem.  To see this, let’s add some debugging output to the program:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
update();
} else {
double test_time = prev_time + 0.200;
std::printf("curr_time = %a\n", curr_time);
std::printf("test_time = %a\n", test_time);
}

Output:
curr_time = 0x1.004034p+7
test_time = 0x1.004034p+7


The two values being compared are exactly equal!  That is, both the current time (128.1253990141936) and the test time (the sum 127.9253990141936+0.200) are represented by exactly the same 64 bits in the underlying IEEE-754 double-precision representation.

So what’s going on?  If two floating-point values are indeed equal, how can they fail a weak inequality comparison?

It turns out that the problem is not that precision is limited, but that precision is varying, and varying in ways that the programmer cannot predict or control.  In the if-condition, the compiler has decided to evaluate the sum on the right-hand side in extended precision (something more than 64 bits, probably 80 or 96), and then perform the comparison, still in that extended precision register, without first rounding back down to 64-bit double precision.  As a result, those additional extended-precision bits make the sum slightly greater than the “truncated” double-precision value of the current time on the left-hand side.

This can be a particularly nasty problem to troubleshoot, because the behavior is very dependent on platform, compiler, and even “code context.”  (In this case, the simulation was running on Linux using gcc 3.4.4.  I have been unable to reproduce the problem anywhere else.)  By “code context” I mean that the compiler has a lot of leeway in determining if/when to jump back and forth between extended and double precision, and those decisions can change depending on surrounding code!  For example, we “masked” the behavior somewhat by our unsuccessful attempt at debugging output above, since the compiler is forced to eventually round the sum and store it in a double-precision variable so that we can print it.

Goldberg does describe this issue, although the reader must make it 80% of the way through a 70-page paper to find it.  But the C++ FAQ also contains a section (Cline calls it the “newbie section,” but it’s really more of an attic for anecdotes without a better home) that I think does a great job of describing the problem as well.

Edit: The Random ASCII blog has a great series of articles on floating-point issues.  The one most directly relevant to the problem discussed here is titled “Floating-Point Determinism,” in particular the section on composing larger expressions.

Posted in Uncategorized | 8 Comments