undefined | Better HN

0 pointsSharlin1y ago0 comments

It's more like the opposite. These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't. So the results may vary depending on what transformations the compiler performs – in particular, they may vary between optimized and non-optimized builds, which normally isn't allowed.

0 comments

6 comments · 1 top-level

vanderZwan1y ago· 5 in thread

> These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't.

I wonder if it is possible to add an additional constraint that guarantees the transformation has equal or fewer numerical rounding errors. E.g. for floating point doubles (0.2 + 0.1) - 0.1 results in 0.20000000000000004, so I would expect that transforming some (A + B) - B to just A would always reduce numerical error. OTOH, it's floating point maths, there's probably some kind of weird gotcha here as well.

StefanKarpinski1y ago

Pretty sure that’s not possible. More accurate for some inputs will be less accurate for others. There’s a very tricky tension in float optimization that the most predictable operation structure is a fully skewed op tree, as in naive left-to-right summation, but this is the slowest and least accurate order of operations. Using a more balanced tree is faster and more accurate (great), but unfortunately which tree shape is fastest depends very much on hardware-specific factors like SIMD width (less great). And no tree shape is universally guaranteed to be fully accurate, although a full binary tree tends to have the best accuracy, but has bad base case performance, so the actual shape that tends to get used in high performance kernels is SIMD-width parallel in a loop up to some fixed size like 256 elements, then pairwise recursive reduction above that. The recursive reduction can also be threaded. Anyway, there’s no silver bullet here.

scythe1y ago

I think a restricted version might be possible to implement: only allow transformations if the transformed version has strictly fewer numerical rounding errors on some inputs. This will usually only mean canceling terms and collecting expressions like "x+x+x" into 3x.

In general, rules that allow fewer transformations are probably easier to understand and use. Trying to optimize everything is where you run into trouble.

legobmw991y ago

Kahan summation is an example (also described in the top level article) of one such “gotcha”. It involves adding a term that - if floats were algebraic in this sense - would always be zero, so ffast-math often deletes it, but this actually completely removes the accuracy improvement of the algorithm

anthk1y ago

Under EForth with FP done in software:

2 f 1 f 1 f f+ f- f. 0.000 ok

PFE, I think reusing the GLIBC math library:

2e0 1e0 1e0 f+ f- f. 0.000000 ok

Dylan168071y ago

Every implementation of floating point can handle small to medium integers without rounding. So that example doesn't show anything we can learn from.

1 more reply

j / k navigate · click thread line to collapse