clang test.c -O0 -fsanitize=undefined
./a.out
[...]
test.c:17:12: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'
Interestingly gcc doesn't throw that warning.
There also exists low-cost random-sampling-based ASAN implementation that can be enabled in production: Google uses GWP-ASAN for all server-side applications as well as Chrome on Windows/Mac. See https://www.youtube.com/watch?v=RQGWMLkwrKc for details.
Here 14%, https://www.jetbrains.com/lp/devecosystem-2019/cpp/
Here 40 - 55%, https://www.bfilipek.com/2019/12/cpp-status-2019.html
At CppCon 2015 or something, at Herb's question during his keynote, about 1% of the audience as per his comment on the video.
The reason they aren't enabled by default is that that's not what they're designed for. They have a significant performance impact, you can't enable them all at once, they conflict with other security features and they may introduce security issues.
These are developer features. They aren't there to run your production code, they are there to test during development and bug finding.
On the other hand, the clang static analyzer may have a shocking amount of messages when run first on a large existing code base, and some of those warnings can be considered more "opinions" than warnings. It's still makes sense and is very rewarding to make a code base "static analyzer clean".
The runtime sanitizers in comparison are very precise and always pointed to actual "sleeper bugs", it's almost definitely a good idea to use them and take their warnings serious.
But anyway, clang ASAN, UBSAN, TSAN and the static analyzer are all really excellent and important tools for everybody writing C or C++ code.
PS: the reason why those checks are optional is that they increase compilation time (sometimes dramatically, like 10x slower compilation or more), and they add runtime instrumentation code which both increases the executables size and decreases performance dramatically (also 2..10x times or more, although the clang sanitizers are really quite fast compared to other solutions).
Note that ruining sanitizers in prod might be insecure. They're for development.
20 years ago it was advertised as the safe C alternative.
This should be fairly obvious with knowledge about how floating point numbers are represented internally IMO.
Edit: Be more precise about what can be represented.
It's also worth noting that every finite double with magnitude larger than 2^52 has a precise integer value; it's just that once you get beyond 2^53, not every integer is representable.
[edit]
Since the extra value is precisely a power of 2 (-2^52), then it will round correctly, however the value is arguably not precisely -2^52 since it has an epsilon of greater than 1.
A good way to get around this is reading https://floating-point-gui.de/ to weed out any preconceptions, but yeah it's difficult to steer novices there without them stepping on one of the pitfalls first.
When you work with floating point, you need to remember you work with a tolerance to epsilon for comparisons because you are rounding to 1/n^2 precision and different floating point units perform the conversion in different ways.
You must abandon the idea of '==' for floats.
This is why his code is unpredictable, because you cannot guarantee the conversion of any integer to and from float is the same number. Period. The LSBs of the mantissa can and do change, which is why we mask to a precision or use signal-to-noise comparisons when evaluation bit drift between FP computations.
He has the first part correct, < and > are your friend with FP. But to get past the '==' hurdle, he needs to define his tolerance, the code should be something like:
if (fabs(f1 - f2) > TOLERANCE) ... fits = true.
I was irked by his arrogance when he asks, "Intel CPUs have a history of bugs. Did I hit one of those?" First, learn about floating point, then, work on an FPUnit team for 10 years, and even then, don't assume you're smarter than a team of floating point architects, you're not.
Totally false. Any integer between -(2^53) and 2^53 can be converted to IEEE 64-bit double without any loss of information.
Those are a specific integer range, not "any integer".
These days I write my floating point code assuming float is a 32-bit IEEE 754 floating point number, and double is 64-bit. You can get those guarantees on any desktop hardware with the right flags, and the same semantics are commonly available on other platforms too.
Picking a well-defined implementation makes it much easier to reason about conversions between integer and floating point types. In fact, it allows you to reason about a lot of operations, e.g. 1.0 + 2.0 == 3.0 is true; (float)183 == 183.0 is true; 0.1 == (double)0.1f is false; etc.
Be aware that +0.0 and -0.0 are different floating point values but represent the same real number, so +0.0 == -0.0 follows.
People who say == means nothing for floating point and you always need epsilon checks are wrong, plain and simple. == is very well defined. Don't confuse the definition of floating point operations with common practices for using them effectively.
You can iterate through all non-NaN values and check that successive ones are indeed not equal:
#include <math.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <inttypes.h>
int main()
{
float x = (float)-INFINITY;
uint64_t count = 1;
while (x != (float)INFINITY) {
float y = nextafterf(x, (float)INFINITY);
assert(y != x);
x = y;
++count;
}
printf("Found %" PRIu64 " floats.\n", count);
return 0;
}
$ gcc -std=c99 -O3 a.c -lm -o a
$ ./a
Found 4278190081 floats.
(a little bit harder for doubles)Interestingly, this only finds one zero (-0.0), hence the assert doesn't actually fail around zero.
Casting the value to double ends up converting the long value 0x7fffffffffffffff to the nearest double value: 0x8000000000000000. As the -O0 version CORRECTLY reports, this does not round-trip back to the same value in the "long" type. Many other values, though not all, down to about 1/1024 of that value (1 / 2^(63-53)) will also fail to round-trip for similar reasons.
Unless my coffee-deficient brain is missing something at the moment, it should be the case that any integer with 53 bits or fewer between the first and last 1 bit (inclusive) will roundtrip cleanly. Any other integer will not.
Edit: fixed a typo above, and coded up the idea I expected to work and ran it through quickcheck for a few min, and this version seems to be correct ('int' return rather than bool is just because haskell's ffi doesn't natively support C99 bool):
#include <limits.h>
int fits(long x) {
if (x == LONG_MIN) return 0;
unsigned long ux = x < 0 ? -x : x;
while (ux > 0x1fffffffffffffUL && !(ux & 1)) {
ux /= 2;
}
return ux <= 0x1fffffffffffffUL;
} if( value < (double) LONG_MIN || value > (double) LONG_MAX)
return( 0);
The cast of LONG_MAX to double rounds upwards (which is allowed by the standard, which says that rounding direction is "implementation defined"), so that "value > (double) LONG_MAX" is false, right? Even though the mathematical value of "value" is larger than LONG_MAX?Which then leads to this line:
l_val = (long) value;
Where value is cast to a long despite being outside of the range of longs, thus causing undefined behavior. So to be clear, BOTH of the -O0 and -O3 versions are correct, since both invoke undefined behaviour.When -O0 and -O3 give different results, either at least one of them is incorrect and you've stumbled on a compiler bug, or both of them are correct and you're invoking UB (the far more common situation).
EDIT: no, I think I misunderstood it: it's not that value is larger than (double) LONG_MAX, it IS (double)LONG_MAX, so of course "value > (double)LONG_MAX" is false.
The problem is that the "(long)((double)LONG_MAX))" is undefined behaviour on implementations that round (double)LONG_MAX upwards instead of downwards. Which is allowed by the standard. Ok, cool :)
The answer by ambrop7 below solves a different problem but appears to be correct. The reason escaped me at first and is super subtle. The trick is that LONG_MAX is 2^63 - 1, not 2^63. And the subtlety is that 2^63 is guaranteed to be exactly representable in IEEE double because it is an even power of 2, which 2^63-1 is not.
I don't care much for runtime ldexp() anyhow. So I'd be tempted to just pre-compute the exact limits -2^63 and nextafter(+2^63, 0) and encode them as doubles manually (omitting some #if method of portably determining the width of LONG):
#define FLONG_MIN 9223372036854775808.0 // exact -2^63
#define FLONG_MAX 9223372036854774784.0 // exact nextbefore(2^63)
if (value < FLONG_MIN || value > FLONG_MAX)
return(0);
Then the UB is avoided. I think the rest of the module works as is. Now, I question whether the author really wants integers between 2^53 and 2^63 to sparsely return true. So it might be better to just change the whole design to use +-2^53 as a hard limit, for trivially guaranteed round trip of the entire range and dispense with these nasty edge cases.From what I understand from the spec it should be the nearest in value, no? Not the nearest in memory representation.
The language spec (or at least a summary of it) is linked in the article, and it is pretty loose: nearest higher or nearest lower, chosen by the implementation (regardless of which is nearer).
IEEE double-precision float cannot store LONG_MAX (which is 2^63-1 with 8-byte longs) precisely, so it gets converted to 2^63; which you cannot cast back to a long, because it doesn't fit (resulting in undefined behaviour).
Add enough zeroes, and you’ll run out of exponent range.
Edit: check for me whether just calling lrint(x) works. The manpage doesn't specify that lrint() will set FE_INEXACT, but it seems weird to me that it wouldn't.
Great, thanks, now I have to go back and restart some of those code reviews I've been doing of certain third party matrix math libraries...
Annex F:
The lrint and llrint functions provide floating-to-integer conversion as prescribed by IEC 60559. They round according to the current rounding direction. If the rounded value is outside the range of the return type, the numeric result is unspecified and the ''invalid'' floating-point exception is raised. When they raise no other floating-point exception and the result differs from the argument, they raise the ''inexact'' floating-point exception.
> The floating-point environment has thread storage duration. The initial state for a thread's floating-point environment is the current state of the floating-point environment of the thread that creates it at the time of creation.
if( ! fits)
Why this (constently) terrible formatting though? Never seen anyone using this style.I don't know if this is supposed to be a joke or part of the setup for an explanatory post about undefined behaviour, but that list is in exactly the wrong order.
Comparing floats is more subtle than most programmers realize, and there really isn't a one-size-fits-all solution.
Things to consider due to the nature of fp representation - comparing results close to zero is different (i.e. "is a small" needs a differen test than "are a & b close"
- the distance between fp numbers depends on their magnitude, so comparing two large numbers to each other shouldn't have the same bounds as comparing two numbers near 1, say[1].
- if you aren't quite careful you can easily create tests where a == b but b != a , which can cause sorting issues, etc.
Hand-wavily speaking if you want to do this "right", you should probably look at doing the analysis in ULP (units in last place) rather than directly on the floats. Don't do it for values near zero though. And have a fast path for differently signed values.
The above doesn't even get into denormalized values.
[1] note that what people usually mean for epsilon is the version of machine epsilon that is the difference between 1 and the next representable float above 1 [2]. So by definition this is smaller than the representable difference between any two numbers in larger decades
[2] MS .NET somewhat confusingly defines Epsilon as the smallest representable normalized number.
I'd say the cleanest would be to decode exponent and mantissa, check if the exponent is within the 64-bit limit of long, then check if there's any bits set below the decimal point. (+plus some extra care for two's complement negative numbers)
The problem with this is of course that this would be platform dependent.
What you're seeing is not excess precision due to wide registers but excess precision due to optimization and constant propagation, which means GCC calculates a fast path for (argc == 1) that doesn't round correctly and ends up with "it fits".
Interestingly it does optimize to the correct "doesn't fit" with -mfpmath=387 -fexcess-precision=standard, so I guess this is a bug in how GCC treats SSE math. The sanitizer (-fsanitize=float-cast-overflow) also notices the problem.
#include <math.h>
#include <fenv.h>
int fits_long( double d)
{
long l_val;
double d_val;
// may be needed ?
// #pragma STDC FENV_ACCESS ON
feclearexcept( FE_INVALID);
l_val = lrint( d);
d_val = (double) l_val;
if( fetestexcept( FE_INVALID))
return( 0);
return( d_val == d);
}
The article explains it in more detail. Thanks for the help.Is there a way to get the largest double smaller or equal than some positive integer?
I really dislike the arrogant programmer trope. Can we all stop?