After a couple of interesting conversations on the subject, I think I should try to clear up some potentially confusing comments I made in my last post about converting binary floating-point values to decimal strings… and back to binary floating point again.

**Injection != round-trip identity**

“… You need the representation to be one-to-one, so that distinct finite values always produce distinct output. More precisely, we would like to be able to (1) convert a floating-point value to a string, then (2) convert the string back to floating-point, recovering

exactlythe original value .”

In the above quote, “more precisely” is a misleading choice of words, since it suggests that these two properties (one-to-one-ness and round-tripping) are equivalent, when they are not. For a simplest possible example, consider the correctly rounded floating-point conversion from *single*-bit binary to *single*-digit decimal. This conversion is one-to-one: distinct powers of two always map to distinct single-digit multiples of powers of 10. However, the *round-trip* conversion, from binary to decimal *and back to correctly rounded decimal*, is not the *identity*. It’s an exercise for the reader to verify that the value , used by Matula in Reference (2) below, is a minimal counterexample.

To see why this distinction matters, consider the more general problem of the correctly rounded conversion from -digit base to -digit base … where and are not “*variants of a common base*” (more on this later). Then as shown in Reference (1), the resulting function is one-to-one if and only if

However, we want more than the ability to simply *invert* this one-to-one conversion function: instead, we would like to be able to *compose* this conversion function with another *correctly rounded* conversion back in the other direction, and yield the identity. As shown in Reference (2), this composition yields the identity if and only if the following stronger condition holds:

Fortunately, in the case where and , the single-bit-to-single-digit wrinkle mentioned above (i.e., ) is the *only *situation where these two conditions are not equivalent, and so in practice when using floats and doubles this distinction is less important.

**Variants of a common base**

It’s another exercise for the reader to verify that the second inequality above may be solved for a minimal number of decimal digits required for a successful round-trip conversion from -bit binary, to yield the formula from the last post:

However, there is an important detail involved in turning that *strict* inequality into this ceiling expression, that is left out of many discussions that I see online (at least where an attempt is made to generalize beyond binary-decimal conversions): this only works since is not an integer. That is, base 2 and base 10 are not “variants [i.e., powers] of a common base,” as discussed in the references below. For conversions between bases that *are* powers of some common base, these formulas do not hold in general. (Consider various conversions between binary and hexadecimal, for example, again as discussed in the references below.)

**References:**

- Matula, David W., The Base Conversion Theorem,
*Proceedings of the American Mathematical Society*,**19**(3) June 1968, p. 716-723 [PDF] - Matula, David W., In-and-Out Conversions,
*Communications of the ACM*,**11**(1) January 1968, p. 47-50 [ACM]

Pingback: Serializing MATLAB data | Possibly Wrong