Floating-point equality: It’s worse than you think

While recently helping to fix some buggy simulation software, I encountered a nasty problem involving comparison of floating-point values.  I was vaguely aware of the potential for this sort of problem to occur, but this was the first time that I had actually seen it in the wild.

The program was an event-driven simulation, with a periodic “update” event that was supposed to occur, well, periodically (5 times per simulated second).  But it would consistently hang after a couple of minutes, getting stuck in an infinite loop because the periodic update event stopped triggering.  Here is what the code looked like, with the actual time values that caused the problem:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
} else {
    // We shouldn't be here.

The if-condition was expected to evaluate to true, indicating that enough time had passed since the previous update.  But the condition failed, thus preventing the update and associated time advance.  See if you can identify the cause of the problem.

At this point, the usual from-the-hip diagnosis is, “Floating-point arithmetic is only an approximation; you should never test values for equality, only for differences within some tolerance.  You should probably read What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg.”

This is mostly good advice… but when reading online discussions about this issue, I often get the impression that part of Goldberg’s story is missed.  That is, it is certainly important to recognize that floating-point values have limited precision, so that arithmetic and comparisons that we might expect to be valid for real numbers on a cocktail napkin are not valid when evaluated using limited floating-point precision.  For example, evaluating the expression 1.1+2.2==3.3 typically yields false, since none of the three values can be represented exactly as a rational number with a denominator that is a power of 2.  So instead of the equality you want, the computer is instead thinking:

\frac{2476979795053773}{2251799813685248} + \frac{2476979795053773}{1125899906842624} \neq \frac{3715469692580659}{1125899906842624}

But in the case of the simulation bug described above, this wasn’t the problem.  To see this, let’s add some debugging output to the program:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
} else {
    double test_time = prev_time + 0.200;
    std::printf("curr_time = %a\n", curr_time);
    std::printf("test_time = %a\n", test_time);

curr_time = 0x1.004034p+7
test_time = 0x1.004034p+7

The two values being compared are exactly equal!  That is, both the current time (128.1253990141936) and the test time (the sum 127.9253990141936+0.200) are represented by exactly the same 64 bits in the underlying IEEE-754 double-precision representation.

So what’s going on?  If two floating-point values are indeed equal, how can they fail a weak inequality comparison?

It turns out that the problem is not that precision is limited, but that precision is varying, and varying in ways that the programmer cannot predict or control.  In the if-condition, the compiler has decided to evaluate the sum on the right-hand side in extended precision (something more than 64 bits, probably 80 or 96), and then perform the comparison, still in that extended precision register, without first rounding back down to 64-bit double precision.  As a result, those additional extended-precision bits make the sum slightly greater than the “truncated” double-precision value of the current time on the left-hand side.

This can be a particularly nasty problem to troubleshoot, because the behavior is very dependent on platform, compiler, and even “code context.”  (In this case, the simulation was running on Linux using gcc 3.4.4.  I have been unable to reproduce the problem anywhere else.)  By “code context” I mean that the compiler has a lot of leeway in determining if/when to jump back and forth between extended and double precision, and those decisions can change depending on surrounding code!  For example, we “masked” the behavior somewhat by our unsuccessful attempt at debugging output above, since the compiler is forced to eventually round the sum and store it in a double-precision variable so that we can print it.

Goldberg does describe this issue, although the reader must make it 80% of the way through a 70-page paper to find it.  But the C++ FAQ also contains a section (Cline calls it the “newbie section,” but it’s really more of an attic for anecdotes without a better home) that I think does a great job of describing the problem as well.

Edit: The Random ASCII blog has a great series of articles on floating-point issues.  The one most directly relevant to the problem discussed here is titled “Floating-Point Determinism,” in particular the section on composing larger expressions.


This entry was posted in Uncategorized. Bookmark the permalink.

10 Responses to Floating-point equality: It’s worse than you think

  1. qznc says:

    Actually, you can predict and control precision. That is the core problem that IEEE-754 solves. Possible fix:

    if (curr_time >= (double)(prev_time + 0.200)) {

    Minor nitpick: A value does not have a precision, it is just a number. The floating point computation has a precision. E.g.: If you computed 𝛑 as 3.1, then the computation has a precision of (at most) two digits, but 3.1 is just a number and precisely 3.100000…

    • This is an interesting point, about which discussion on Reddit’s /r/compsci is currently ongoing. A similar possible fix would be to compute the sum in a temporary double variable (like the test_time above) ahead of time. But although in practice I know of no compiler where this *wouldn’t* work, the standard seems pretty permissive, so I’m not sure that it *has* to work. That is, a sufficiently smart compiler could presumably “optimize away” the temporary, and still perform the calculation in extended precision.

      The Reddit discussion mentions the various compiler flags that control this behavior in gcc, in my opinion forgetting that completely abandoning extended-precision computation isn’t necessarily a good thing just because it prevents situations like this. I had hoped that the take-away from this would not be the details of specific platforms, compilers, or flags where particular code works or doesn’t, but instead a reminder to simply *think* about this as a potential problem, and, where possible, to *change the code* to be more robust no matter what the platform, compiler, or flags in use.

      (For example, the solution in this case was to recognize that the simulation’s “real” time resolution was 1 ms, and all those lesser significant bits were merely the cumulative effect of adding delta-t’s. We can make this problem go away by effectively working, temporarily where necessary, in *integral* units of ms, and converting back to double-precision seconds in a controlled way.)

  2. brucedawson says:

    I actually think the most relevant article from my series might be this one:


    Compilers are allowed enormous flexibility in deciding what precision to use in evaluation expressions, and that can lead to surprising results. These surprising results are particularly difficult to avoid with the x87 FPU.

    I realize it is a simplified fraction of your code, but I’m confused as to why you only advance the time if 0.2 s has passed. Shouldn’t you always advance time? Then this issue disappears, or at least becomes much less significant — you just miss one update and then continue along.

    But yeah, variable precision is a nuisance. It’s a lurking bug. Individual instances of it can be squashed, but you can never tell when it will pop up again. I recommend compiling for SSE2 which will reduce these problems significantly.

    Or use an int64 count of ms.

    • You’re right; I read the intermediate precision article first. But I continued reading to make sure I hadn’t missed more details; I chose the determinism article because (a) it links to the intermediate precision one, and (b) it seemed like a superset of useful relevant information that readers might otherwise miss.

      And you are also right in suspecting that the code logic has been simplified here to focus on this particular problem. Time *does* always advance, but to include the details of how that time advancement interacts with this floating-point comparison problem seemed unnecessary.

  3. gtownescapee says:

    As soon as I read “event driven simulation” I thought, oh no, this could be a bug I left behind. Fortunately, I never wrote any C code that you’d be looking at. Nice catch.

  4. Pingback: Floating-point round trips | Possibly Wrong

  5. Pingback: Floating-point agreement between MATLAB and C++ | Possibly Wrong

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.