C / C++ FAQs & Programming Resources - ProkutFAQ : FloatCompare

HomePage Recent Changes Recently Commented Login/Register

Why don't floating point comparisons work?


Reproduced with permission from C++ FAQ

How are floating points represented?

We must understand how floating point numbers are stored internally to understand this. The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation. The IEEE standard for 32 bit floats uses 1 bit for sign, 8 bits for exponent, and 23 bits for mantissa. Since a normalized binary-point mantissa always has the form 1.xxxxx... the leading 1 is dropped and you get effectively 24 bits of mantissa. The number 1000.43 (and many, many others, including some really common ones like 0.1) is not exactly representable in float or double format. 1000.43 is actually represented as the following bitpattern (the "s" shows the position of the sign bit, the "e"s show the positions of the exponent bits, and the "m"s show the positions of the mantissa bits):

s e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m
0 1 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 1

The shifted mantissa is 1111101000.01101110000101 or 1000 + 7045/16384. The fractional part is 0.429992675781. With 24 bits of mantissa you only get about 1 part in 16M of precision for float. The double type provides more precision (53 bits of mantissa).


Why doesn't floating point comparison work as expected?

Because floating point arithmetic is different from real number arithmetic.

Bottom line: Never use == to compare two floating point numbers.

Here's a simple example:

double x = 1.0 / 10.0;
 double y = x * 10.0;
 if (y != 1.0)
   std::cout << "surprise: " << y << " != 1\n";


The above "surprise" message will appear on some (but not all) compilers/machines. But even if your particular compiler/machine doesn't cause the above "surprise" message (and if you write me telling me whether it does, you'll show you've missed the whole point of this FAQ), floating point will surprise you at some point. So read this FAQ and you'll know what to do.

The reason floating point will surprise you is that float and double values are normally represented using a finite precision binary format. In other words, floating point numbers are not real numbers. For example, in your machine's floating point format it might be impossible to exactly represent the number 0.1. By way of analogy, it's impossible to exactly represent the number one third in decimal format (unless you use an infinite number of digits).

To dig a little deeper, let's examine what the decimal number 0.625 means. This number has a 6 in the "tenths" place, a 2 in the "hundreths" place, and a 5 in the "thousanths" place. In other words, we have a digit for each power of 10. But in binary, we might, depending on the details of your machine's floating point format, have a bit for each power of 2. So the fractional part might have a "halves" place, a "quarters" place, an "eighths" place, "sixteenths" place, etc., and each of these places has a bit.

Let's pretend your machine represents the fractional part of floating point numbers using the above scheme (it's normally more complicated than that, but if you already know exactly how floating point numbers are stored, chances are you don't need this FAQ to begin with, so look at this as a good starting point). On that pretend machine, the bits of the fractional part of 0.625 would be 101: 1 in the -place, 0 in the -place, and 1 in the ⅛-place. In other words, 0.625 is + ⅛.

But on this pretend machine, 0.1 cannot be represented exactly since it cannot be formed as a sum of a finite number of powers of 2. You can get close but you can't represent it exactly. In particular you'd have a 0 in the -place, a 0 in the -place, a 0 in the ⅛-place, and finally a 1 in the "sixteenths" place, leaving a remainder of 1/10 - 1/16 = 3/80. Figuring out the other bits is left as an exercise (hint: look for a repeating bit-pattern, analogous to trying to represent 1/3 or 1/7 in decimal format).

The message is that some floating point numbers cannot always be represented exactly, so comparisons don't always do what you'd like them to do. In other words, if the computer actually multiplies 10.0 by 1.0/10.0, it might not exactly get 1.0 back.

That's the problem. Now here's the solution: be very careful when comparing floating point numbers for equality (or when doing other things with floating point numbers; e.g., finding the average of two floating point numbers seems simple but to do it right requires an if/else with at least three cases).

Here's the wrong way to do it:

void dubious(double x, double y)
 {
   ...
   if (x == y)  // Dubious!
     foo();
   ...
 }


If what you really want is to make sure they're "very close" to each other (e.g., if variable a contains the value 1.0 / 10.0 and you want to see if (10*a == 1)), you'll probably want to do something fancier than the above:

void smarter(double x, double y)
 {
   ...
   if (isEqual(x, y))  // Smarter!
     foo();
   ...
 }


There are many ways to define the isEqual() function, including:

inline bool isEqual(double x, double y)
 {
   const double epsilon = /* some small number such as 1e-5 */;
   return std::abs(x - y) <= epsilon * std::abs(x);
   // see Knuth section 4.2.2 pages 217-218
 }


Note: the above solution is not completely symmetric, meaning it is possible for isEqual(x,y) != isEqual(y,x). From a practical standpoint, does not usually occur when the magnitudes of x and y are significantly larger than epsilon, but your mileage may vary.
 Comments [Hide comments/form]
Page was generated in 0.0674 seconds