The "bug" here is that a single operation x + 0.1 and x + 0.1 can give two different results depending on the mood of the compiler.
It's not actually a bug, because the IEEE 754 standard permits this kind of sloppiness by default. You can get stricter semantics, but you have to ask for it explicitly by setting the right compiler flags. This is why I say that the IEEE 754 standard makes floating-point arithmetic deterministic "in principle" - in practice, you have to work for it. (Changes to the rounding mode are another big issue here.)
An extra error of 1 ulp somewhere might not seem like much for your average numerical code, but it's a big deal in some situations:
* Assumptions about the precise behavior of floating-point cancellations are often used in numerical libraries, and deviations can cause disastrous loss of accuracy. For example, Kahan summation is a widely used algorithm for computing sums with increased accuracy. If compilers optimize the code too aggressively, the algorithm doesn't work and you get worse accuracy. And ironically, x87 long doubles can make general numerical calculations less accurate (not more accurate as intended) due to the double rounding issue.
* If you use floats in game logic, you might want it to behave exactly the same way on different platforms, e.g. for synchronization or replays.
* In scientific computing, you might want results to be precisely reproducible by others.
* Debugging!
So just a warning to everyone out there: unless you are very, very careful your floating point code will not be deterministic. If you thought you can just use floats and expect it to "just work" the same, then you're out of luck.
Here's a couple of articles which go into more detail about problems with floating point determinism:
http://gafferongames.com/networking-for-game-programmers/flo...
https://randomascii.wordpress.com/2013/07/16/floating-point-...
abs(a - b) <= eps*0.5*(a + b)
or even (abs(a - b) <= eps*abs(a)) && (abs(a - b) <= eps*abs(b))
which is implemented in Boost [0].[0] http://www.boost.org/doc/libs/1_34_0/libs/test/doc/component...
(a / b) != (a / b)
assert x/y == x/y
Just because someone doesn't know the basics of floating point numbers it doesn't mean that the floating point format is somehow evil.
Testing floats for equality is one of those no-nos which is taught right in the first class of introduction to FP numbers.
Another very basic issue covered right in the intro clsses are problems caused by using decimal values to define numbers encoded in binary format, which is often explained through the famous "0.1 doesn't exist in binary" trivia bit.
Of course, it's not fine to ask "Does the result of this computation equal this value?", but everyone knows that.
How about "Does the result of this computation equal the result of this exact same computation?"? That's the issue here.
If the author of the IEEE 754 standard says that the way you do floating-point arithmetic is obviously a bug, you're probably doing it wrong.
That's interesting, because it seems to contradict his attitude toward FLT_EVAL_METHOD.
Kahan is a strong supporter of FLT_EVAL_METHOD==2—in fact, he's quite possibly the single person who's most responsible for its existence—and FLT_EVAL_METHOD==2 has been resoundly rejected by compilers on newer architectures due to being totally unintuitive. (For those who don't know, FLT_EVAL_METHOD==2 means that "x∗y+z" can have a different result from "double tmp = x∗y; tmp + z", even if x, y, and z are doubles. It's a very similar situation to the bug, just with expression temporaries vs. explicit storage locations instead of register-memory spills.)
In particular, Kahan wrote a widely-shared, hyperbolic essay called "Why Java Floating Point Hurts Everyone Everywhere" (which is still sometimes shared around today as a way to attack Java, despite being obsolete) castigating Java for having FLT_EVAL_METHOD==0. Java added the little-used "strictfp" in response, but history proved Java's original behavior right—with the advent of SSE on x86-64, GCC switched to FLT_EVAL_METHOD==0 by default, matching Java.
Besides, Kahan's x87 FPU design is arguably responsible for this bug to begin with. The 80-bit FP register stack is designed makes sense when you consider ancient C compilers that didn't keep variables in registers unless explicitly told to do so. In that case, the register stack allows for a simple implementation of FLT_EVAL_METHOD==2. But if assignment to a variable doesn't result in a spill (as in any non-toy compiler since 1990?) then this becomes awkward, because register spills no longer coincide with those evaluation rules. This is just a design mistake in the x87 FPU, and it's a large part of the reason why we have scalar FP math in SSE.
P.S. answer to pcwalton's post below: no it's certainly not "pretty much" "hey, you lose precision when you evaluate in fewer bits." It's: let me do the full IEEE standard in Java. It's: if the programmer doesn't specify what the sizes of the calculations in the expressions are, do these calculations with the biggest sizes (but predictably!), and if he specifies, respect what he specified(1). In the sample program we discuss, doubles as the results are explicit. These doubles can be then compared even in 80 bits but they will be still the same. It's not a "spill" it's an explicit cast that should happen.
1) "For example, tan(x) should delivers a double result no matter whether x is float or double, but tan((float)x) should deliver a float result provided a suitable tan method is available. Thus, a programmer who gives no thought to the question gets the safer default." This is obviously not "hey, you lose precision when you evaluate in fewer bits" he wants to be able to choose.
The reality is that numerical anslysts want the computer do do exactly what they tell them to, in precisely the order described. The rest of the human race wants approximately the right answer reasonably quickly. (Floating point is a bad choice if you want exactly the right answer). And the people who make purchase decisions read benchmark reports they don't understand.
But numerical analysts really require being able to specify the semantics of calculations precisely in order to write libraries that mere mortals can use to get the approximate right answer.
A quick perusal of any numerical analysis textbook should convince you that order of evaluation, order of truncation, overflow modes, and the like are very relevant for being able to write a library that has the right order of error.
Something like Kahan summation will have vastly different results depend on whether intermediate calculations are done at a higher precision or whether the compiler is allowed to reorder expressions. In fact, an overly aggressive optimizing compiler will make Kahan summation simply fail [1].
If all you're doing is adding a few numbers together, by all means, you don't need or want to think about precise semantics. But if you've ever used BLAS or Numpy or any other math package, you very much depend on languages and compilers being pedantic as possible in spelling out how floating point gets evaluated.
Not at all. As far as we're concerned, it's just fine if the computer works by consulting an oracle.
But we want the answers to be exactly the same as if the computer did what we told it to do. :-)
Java has a flag which allows you to specify how 32-bit floating point arithmetic happens (STRICT_FP or something.) C# does not, and was doing extended precision math on floats (80-bit register) and then truncating the result. This led to small differences in feature extraction output.
#include <stdio.h>
void test(double x, double y)
{
const double y2 = x + 1.0;
if (y != y2) printf("error\n");
}
void main()
{
const double x = .012;
const double y = x + 1.0;
test(x, y);
}
Then the comparison has to be done on doubles, and "but it's extended precision internally" mentioned in some answers is the wrong answer. All the variables in the program are stated to be doubles. If I'd write the x86 and x87 asm instructions I could easily write them so that they work as expected, even without SSE2.However if you compile with some "special" optimizations that are documented to relax some specific code generation rules, then it can be OK that these specific optimizations behave differently, but then: you asked for it by choosing that documented optimization.
In the answers there, Richard Henderson wrote 2001-01-16 06:30:04 UTC:
"You are seeing the results of excess precision in the FPU. Either change the rounding precision in the FPCR, or work around the problem with -ffloat-store."
He's also right, if the -ffloat-store is documented to be turned off by -O and if it's documented that that flag also affects the precomputed calculations that are done in the compile time. If the second detail is not documented, the results could be unexpected, but again depends of what you expect from the optimizer. The properly written compiler and optimizer should perform the calculations during the code generation correct to the definition and not cut the corners. If the generated code was generated to actually perform the != during the run time, then the compile-time calculations are irrelevant, but the sane default should be something like -ffloat-store when not using SSE2 even with -O, if -O has the semantic "the optimizations that should not produce really unexpected effects." It sounds complicated, but it has its reasons.
P.S. answer to nsajko's post below: I mention -ffloat-store not ffast-math. Read the description of ffloat-store please:
https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Optimize-Options....
and also "Note on x86 and m68080 floating-point math" here https://gcc.gnu.org/wiki/FloatingPointMath
> The x87 FPU was originally designed in (or before) 1980. I think that's why it is quite simple: it has only one unit for all FP data types. Of course, the precision must be of the widest type, which is the 80-bit long double.
> Consider you have a program, where all the FP variables are of the type double. You perform some FP operations and one of them is e.g. 1e-300/1e300, which results in 1e-600. Despite this value cannot be held by a "double", it is stored in an 80-bit FPU register as the result. Consider you use the variable "x" to hold that result. If the program has been compiled with optimization, the value need not be stored in RAM. So, say, it is still in the register.
> Consider you need x to be nonzero, so you perform the test x != 0. Since 1e-600 is not zero, the test yields true. While you perform some other computations, the value is moved to RAM and converted to 0 because x is of type "double". Now you want to use your certainly nonzero x... Hard luck :-(
> Note that if the result doesn't have its corresponding variable and you perform the test directly on an expression, the problem can come to light even without optimization.
> It could seem that performing all FP operations in extended precision can bring benefits only. But it introduces a serious pitfall: moving a value may change the value!!!
TIL long double is a thing.
On x86-64, the compiler should default to SSE2.
SSE2 is ~16 years old so compatibility shouldn't be an issue.
The fact that they consider this reasonable is insane.
In particular though, it's fairly common to use std::sets (or std::maps which use the same underlying data structure) to maintain something in a sorted order.
I ran into this myself when implementing A*. I was using an std::map with the float as key for the priority queue. Under GCC with default compile arguments it would randomly hang or crash due to GCC breaking the set invariant.
if ( f_a < f_b-f_eps)
if ( f_a > f_b+f_eps)
if (fabs(f_a-f_b)<f_eps)
If you don't do this, then you are on your own. Sometimes it is fine, but sometimes it may give you grief, just like the dangers of other undefined behaviors.PS: I also have a problem that people think "undefined behavior" is undefined because certain standards say so. Undefined behavior is due to lack of agreement of common sense or defies common sense. For example, divide by zero, there is no common sense doing so, and define its behavior does not change anything. Signed integer overflow, there is no common sense agreement for that behavior either. The common sense for undefined behaviors is to avoid it. (On the other hand, there seems to be some common sense agreement on unsigned integer overflow, and people are writing code based on that behavior. But that agreement is weak, so I would avoid it if I can regardless what the standards say.)
PS2: The reason that floating point comparison is undefined behavior because, unlike fixed precision or integers, floating point number has floating precision. Floating precision does not have common sense correspondence. In any real application (where common sense is based), a finite, and above all, consistent precision, is always assumed.
And the solution, by manually specifying a f_eps, we put that comparison into a common-sense understanding that you know the result of comparisons is only meaningful above f_eps. With that common sense, having an internal calculation in higher precision is always OK -- isn't that common sense?
PS3: The question is why not turn undefined behaviors into errors? Undefined behavior does not mean there are no behaviors, and the behaviors can be used, or more often, ignored to achieve certain efficiency. When programmers write code with undefined behaviors, they are extracting their code out of common sense domain, but into a special sense domain. Special sense domain is practically fine. Everyone has their own efficient quirks, just do not expect your quirks to be understood or supported by out large where common sense rules (unless you carefully define it and document it). A language that allows for undefined behaviors allows for individual efficiency, which sometimes it is rather good.
For example, it is ok to write programs with direct floating point number comparisons as long as you understand that your program does not care when the comparison falls into the ambiguous range. Only you who understand your special application can assess that. If you are the only one who is responsible for the result of that code, then it is perfectly fine!
Floating point numbers represent rational numbers, and their arithmetic is straightfoward and follows common sense: you compute the exact operation using rational numbers, and the result is the closest floating point number available.
Floating point operations are never "undefined" either, they are perfectly deterministic and completely specified by the IEEE 754 standards.
There are three floating point values that do not correspond to rationals: INF, -INF and NAN. There is also "-0" that corresponds to the same rational as "0". The arithmetic of these values is extremely common sense to anyone that knows some calculus: 1/0 = INF, INF*0=NAN, INF+INF=INF, INF-INF=NAN, etc.
There is no ambiguity whatsoever in floating point numbers.
They also do not round to the nearest representable rational number. If you double MAXREAL, it will round to INF, even though MAXREAL is infinitely closer to 2*MAXREAL than INF is.
Despite efforts in the 2008 update to the 754 Standard, there still is no guarantee that the Standard will produce reproducible, bitwise identical results. The Standards committee admits this. It is important for everyone to understand the shortcomings of the Standard.
That is OK, but I hope you still can get my main point whether it is true or false. If all you see is everything, I am not sure you understand what I was saying. You have to understand what I was saying in order to argue back -- unless all you care is about right or wrong rather than making progress on understanding.
My main point is to point out why people keep filing these bug reports on floating point results that compilers simply will not fix. That is because, in real applications, we expect numbers to have persistent precisions. Persistent precisions make sense in real world. For example, if you take a measurement using one ruler in mm, that precision does not change whether you are measuring rice or elephant. When you are dealing with money, one always expect the precision do not go beyond cent regardless of what your internal calculation says -- (5.001 dollars is equal to 5 dollars). And in real life calculating with higher precision than its requirement will never result in error. My career is in scientific computing and we routinely deal with very small (or very large numbers). However, in a certain application, a persistent precision is always expected, and numbers within that precision become ambiguous in comparison. For example, if my precision is around 0.001, a holds 5, b hold 5.0008, is a<b? It depends. And if your critical application -- e.g. a feedback control loop -- depends on what is defined by floating point operation, you application can be very unpredictable.
I hate to repeat but your reply didn't address my main point at all -- fixed precision expectation in reality vs. arbitrary precision in computer and vs. infinite precisions in math. Common sense is rooted to reality.
Yes, I understand floating point operation by computer under a fixed width is not really undefined. That is why I made extra points commenting on what do we mean by undefined in a computer language design. Any computer architecture with any given compiler implementation will always give definite results for any input. Undefined behavior has a root meaning from language designer point of view. For example, division, we understand what we intend for how division work, whether it is in integer case or floating point case. But do we understand what we want when the divisor is 0 or near 0? We understand we don't want the computer always crash so we don't specify that is an error, but we don't know what it should do. So we specify that this is an undefined behavior. I understand that there is INF that certain mathematicians perfer. But are you sure that is the common sense behavior? For some of my application, I would rather it is a valid big number, and for others, I would rather it is an error. In stead of wish, I learn to put checks around critical area. And, please hear this, in no application that would like an INF result, INF or -INF. There is no common sense interpretation for INF, other than treat it same as error.
So do understand what ambiguity means. In math, you never have ambiguity. In computer, you never have ambiguity. In real life common sense, ambiguity arises when we can't decide a favorable general solution.
During the next lecture, someone sticks their hand up and says, "this FPU thing is so nondeterministic, who in their right mind would use floating point for 99% of the kinds of work that we use computers for?"
The professor says, "you should think real hard about that the next time you decide to use a floating point type in your program."
What? What residual state? Floating-point math is tricky, but on a particular machine/compiler/compile-options set you should expect that identical operations on identical inputs should give identical results.
What is this "residual state" you mention?
If you have an FPU that lets you change precision or rounding modes then you can get different results from the same operations - is that what you're talking about?
I get frustrated when people think that floating-point just randomly changes bits for no reason. It doesn't. It randomly changes bits for specific reasons, which can be understood.
https://software.intel.com/articles/consistency-of-floating-...
I still remember when I was told to never compare my floating point result to zero because even when it should be it probably won't be, always compare it to be less than a small sigma.
Unfortunately, this approach can result in problems with the usual expectation that equality is transitive. Consider 0 < x < y < sigma. 0 == x, 0 == y, but x != y. This problem applies generally any time you're doing equality comparisons with sigma.
`0.f == x` becomes `SIGMA > x`
`0.f == y` becomes `SIGMA > y`
`x == y` becomes `SIGMA > abs(x - y)`
All the expression work if written correctly.
On the other side, IEEE 754 allows extended precision to be used in place of double and of course requires that any chosen precision be kept or else.
But C Standard mandates IEEE 754 , too!
It seems to me that modern C deliberately chose to ignore such kind of mismatch in the name of (substantial) performance gains. K&R, good or bad, was way simpler!
Are you sure? I can't find it in the standard, specifically not in [1] page 28., section 5.2.4.2.2.
[1] (the final draft of C1X, dated 12 April 2011) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
floating point calculations can be executed at wider range (§5.1.2.3.12);
assignments and casts are under obligation to wipe out that extra precision (§ same paragraph);
I am no language lawyer, but .. given the issue of program observable effects (§5.1.2.3.12) and the "as is" implied rule that governs optimizations, how possibly could equality be stable over time?
Because almost no one actually wants wants binary floating point.
So, if you are dealing with raw sensor data, neural nets, character models in a game, numerical simulations, and many other types of data, the extra precision of binary floating-point is far more important than any confusion that arises during those (rare) conversions to decimal.
I realise that floating-point numbers are, to use the technical term, "whack", and I also realise that triaging the issue seems to point to a hardware error, not a software error, but I can't get passed this "yeah, there is a problem, so what?" attitude. Am I missing something critical here that makes this unreasonable to work around or fix?
Reference:
https://gcc.gnu.org/ml/gcc/2003-08/msg01257.html
Specifically:
There are two things missing. The abililty to turn on the workaround in c (i.e. emit FP rounding instruction after every operation), and a bug fix for the register spills.
Even with those things, I think we are still in trouble. In the first case, having explicit rounding instructions eliminates the excess precision problem, but it introduces a double-rounding problem. So we have lost again there. This is probably not fixable without changing the hardware. In the second case, fixing the problem with reload spills eliminates one source of unpredictable rounding, however, we still have the problem that local variables get rounded if they are allocated to the stack and not rounded if they are allocated to an FP reg-stack register. Thus we still have the problem that we get different results at different optimization levels. So we still lose there again also. This might be fixable by promoting all stack locals to long double to avoid unexpected rounding, but that will probably cause other problems in turn. It will break all programs that expect assignments doubles will round to double for instance. If we don't promote stack locals, then we need explicit rounding for them, and then we have double-rounding again.
I really see no way to fix this problem other than by fixing the hardware. The hardware has to have explicit float and double operations.Jan Lachnitt 2008-05-20 16:59:31 UTC
I also encountered such problems and was going to report it as a bug in GCC... But in the GCC bug (not) reporting guide, there is fortunately a link to this page and here (comment #96) is a link to David Monniaux's paper about floating-point computations. This explains it closely but it is maybe too long. I have almost read it and hope I have understood it properly. So I'll give a brief explanation (for those who don't know it yet) of the reasons of such a strange behaviour. Then I'll assess where the bug actually is (in GCC or CPU). Then I'll write the solution (!) and finally a few recommendations to the GCC team.
EXPLANATION
The x87 FPU was originally designed in (or before) 1980. I think that's why it is quite simple: it has only one unit for all FP data types. Of course, the precision must be of the widest type, which is the 80-bit long double. Consider you have a program, where all the FP variables are of the type double. You perform some FP operations and one of them is e.g. 1e-300/1e300, which results in 1e-600. Despite this value cannot be held by a "double", it is stored in an 80-bit FPU register as the result. Consider you use the variable "x" to hold that result. If the program has been compiled with optimization, the value need not be stored in RAM. So, say, it is still in the register. Consider you need x to be nonzero, so you perform the test x != 0. Since 1e-600 is not zero, the test yields true. While you perform some other computations, the value is moved to RAM and converted to 0 because x is of type "double". Now you want to use your certainly nonzero x... Hard luck :-( Note that if the result doesn't have its corresponding variable and you perform the test directly on an expression, the problem can come to light even without optimization. It could seem that performing all FP operations in extended precision can bring benefits only. But it introduces a serious pitfall: moving a value may change the value!!!
WHERE'S THE BUG
This is really not a GCC bug. The bug is actually in the x87 FPU because it doesn't obey the IEEE standard.
SOLUTION
The x87 FPU is still present in contemporary processors (including AMD) due to compatibility. I think most of PC software still uses it. But new processors have also another FPU, called SSE, and this do obey the IEEE. GCC in 32-bit mode compiles for x87 by default but it is able to compile for the SSE, too. So the solution is to add these options to the compilation command: -march=* -msse -mfpmath=sse
Yes, this definitely resolves the problem - but not for all processors. The * can be one of the following: pentium3, pentium3m, pentium-m, pentium4, pentium4m, prescott, nocona, athlon-4, athlon-xp, athlon-mp, k8, opteron, athlon64, athlon-fx and c3-2 (I'm unsure about athlon and athlon-tbird). Beside -msse, you can also add some of -mmmx, -msse2, -msse3 and -m3dnow, if the CPU supports them (see GCC doc or CPU doc).
If you wish to compile for processors which don't have SSE, you have a few possibilities:
(1) A very simple solution: Use long double everywhere. (But be careful when transfering binary data in long double format between computers because this format is not standardized and so the concrete bit representations vary between different CPU architectures.)
(2) A partial but simple solution: Do comparisons on volatile variables only.
(3) A similar solution: Try to implement a "discard_extended_precision" function suggested by Egon in comment #88.
(4) A complex solution: Before doing any mathematical operation or comparison, put the operands into variables and put also the result to a variable (i.e. don't use complex expressions). For example, instead of { c = 2(a+b); } , write { double s = a+b; c = 2s; } . I'm unsure about arrays but I think they should be OK. When you have modified your code in this manner, then compile it either without optimization or, when using optimization, use -ffloat-store. In order to avoid double rounding (i.e. rounding twice), it is also good to decrease the FPU precision by changing its control word in the beginning of your program (see comment #60). Then you should also apply -frounding-math. (5) A radical solution: Find a job/hobby where computers are not used at all.
In a more extreme case, let's say I had two strings: v="1.0" and w="+1.0". atof(v) and atof(w) should ABSOLUTELY compare equal. The logic followed by GCC effectively says that it's OK to compare them by their intermediate string representation instead, though, wherein they would compare as non-equal.
Unums and posits, incidentally, are designed to eliminate all sources of irreproducibility and inexplicable behavior like this gcc bug. Hardware support for posits is on the way, maybe before the year is out. There is a way out of this mess, but it will take a few years to become mainstream.
*DIAGNOSTIC* THE TEST FOR EQUALITY BETWEEN NON-INTEGERS MAY NOT BE MEANINGFUL
UNIVAC 1107 FORTRAN IV error message for this, 1967.See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...
David Goldberg: What Every Computer Scientist Should Know About Floating-Point Arithmetic (1991)
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...
EDIT: apologies to CalChris[2]. What I should have said is discussed Gustafson's Unum work.
[0] https://www.youtube.com/watch?v=aP0Y1uAA-2Y (relevant question at 1h22m37s)
[1] http://web.stanford.edu/class/ee380/Abstracts/170201.html
Anyway, the bug report is linking to a very nice research paper[0]: "The pitfalls of verifying floating-point computations", which even so very technical is a good read on the subject.
Take into account rounding modes and denormals too, and make sure to disable float optimisations that break determinism in your compiler.