Herbie: Automatically improve imprecise floating point formulas (opens in new tab)

(herbie.uwplse.org)

198 pointssummarity1mo ago45 comments

45 comments

I posted this and it picked up steam over night, so I thought I'd add how I'm using it:

I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

The agent then reasons through the implementation and writes core expressions into an FPCore file next to the existing tests, running several passes, refining the pres based on realistic caller input. This logs Herbie's proposed improvements as output FPCore transformations. The agent then reasons through solutions (which is required, Herbie doesn't know algorithm design intent, see e.g. this for a good case study: https://pavpanchekha.com/blog/herbie-rust.html), and once convinced of a gap, creates additional unit tests and property tests (FsCheck/QuickCheck) to prove impact. Then every once in a while I review a batch to see what's next.

Generally there are multiple types of issues that can be flagged:

a) Expression-level imprecision over realistic input ranges: this is Herbie's core strength. Usually this catches "just copied the textbook formula" instance of naive math. Cancellation, Inf/NaN propagation, etc. The fixes are consistently using fma for accumulation, biggest-factor scaling to prevent Inf, hypot use, etc.

b) Ill-conditioned algorithms. Sometimes the text books lie to you, and the algorithms themselves are unfit for purpose, especially in boundary regions. If there are multiple expressions that have a <60% precision and only a 1 to 2% improvement across seeds, it's a good sign the algo is bad - there's no form that adequately performs on target inputs.

c) Round-off, accumulation errors. This is more a consequence of agent reasoning, but often happens after an apparent "100% -> 100%" pass. The agent is able to, via failing tests, identify parts of an algorithm that can benefit from upgrading the context to e.g. double-word arithmetic for additional precision.

Archit3ch1mo ago

> I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

I don't know if there is an equivalent in Roslyn, but in Julia you can have the agent inspect the LLVM output to surface problems in hot loops.

yread1mo ago

Really cool, numerical stability can be tricky as errors can accumulate with each operation and suddenly 53 bits of precision is not enough.

Also nice to see an article thats not about AI or politics

urschrei1mo ago

I resurrected the Rust Herbie lint (now using dylint) a while ago: https://github.com/urschrei/herbie-lint

notpushkin1mo ago

Some trivial cases produce... interesting results.

For x in [−1.79e308, 1.79e308]:

Initial Program: 100.0% accurate, 1.0× speedup

  def code(x):
      return math.sqrt((x + 1.0))

Alternative 1: 67.5% accurate, 5.6× speedup

  def code(x):
      return 1.0

lifthrasiir1mo ago

That does make sense, because a half of all available fp numbers are less than 1 in their magnitude. In particular there should be a plenty of numbers x such that |x| << 1 so x + 1 ~= 1; in fact, the proportion should be just shy of 50%.

andrepd1mo ago

That is indeed one of the problems with IEEE floats. There are only 10^80 atoms in the universe, and a Planck length is 1^-60th of the radius of the universe. But 64-bit floats have an absurd range of over 10^±300! Worse than that, notice that there are as many bit patterns in the never-used range between 10^300 and 10^301 as there are in the super-important range between 1 and 10! Super wasteful. Not to mention the quadrillions of values reserved to represent "NaN"...

This is one of the problems that alternative formats such as the Posit aim to solve. It's quite interesting: I've got an implementation in rust here if you want to play with it https://github.com/andrepd/posit-rust

AlotOfReading1mo ago

Note that the logarithmic distribution of float density is also key to certain kinds of efficient hardware float implementations, because it means you can use the fixed mantissa bits alone as table indices. Unums have proven difficult to build efficient HW implementations for.

IEEE floats have a few warts like any other 1980s standard, but they're a fantastic design.

1 more reply

cocoto1mo ago

But I guess using the density distribution of floating points is rarely useful in a problem. Your actual distribution will almost surely be way different. Imo, the tool presented here should provide a way to manually provide a custom density function (with some common presets like uniform and normal distributions).

pavpanchekha1mo ago

Author here! Yes, the float distribution isn't what you want in practice, but distribution selector isn't really the right thing either, because a low probability bad result can still be pretty bad! Hence the range selector; the float distribution is good at picking extreme values that trigger FP error.

We usually recommend looking for 90%+ accuracy or carefully examining the accuracy plot

FabHK1mo ago

Not really. 1+x/2, however, would be a good approximation to sqrt(1+x) for small (in absolute value) x.

irchans1mo ago

How about:

(* Mathematica Notation, Assume x>0 ) If[ x < 10^(-10), 1+ x/2, ( order x^2 error ) If[ x> 10^10, Sqrt[x], ( order 1/Sqrt[x] error ) (else) Sqrt[x +1] ] ] ( I guess the If statements take too much time. *)

1 more reply

fluorinerocket1mo ago

How useful is this when you are using numbers in a reasonable range, like 10^-12 to 10^12? Generally I try to scale my numbers to be in this range, whether by picking the right units or scaling constraints and objectives when doing nonlinear programming/ optimization.

Like looking at this example,

https://herbie.uwplse.org/demo/b070b371a661191752fe37ce0321c...

It is claimed that for the function f(x) =sqrt(x+1) -1

Accuracy is increased by from 8.5% accuracy to 98% for alternative 5 Which has f(x) = 0.5x

Ok so x=99, the right answer is sqrt(100) -1 = 9

But 0.5 * 99 = 49.5 which doesn't seem too accurate to me.

yossi_peti1mo ago

The precondition on the link you shared has -1 <= x && x <= 1, so 99 is way outside of that range. But even so, testing for x=1, which is supposed to be inside that range, 0.5 doesn't seem tolerably close to 0.4142.

LiamPowell1mo ago

I have a suspicion that the accuracy number is the mean of accuracies over all valid floats in the range (or something approximating that), which is going to be weighted towards zero where the accuracy is higher, and perhaps where sqrt near 1 has some artefacts.

pavpanchekha1mo ago

It is, there's a page in the documentation about how errors are defined. Let me also add: Herbie generally gives the most accurate option it found first, and then the other stuff might be useful for speed (0.5x is way faster than two square roots and a divide!) but it's not as accurate

Rexxar1mo ago

The input should be the range and the distribution of probability on this range. Intuitively we have a tendency to assume an uniform probability for range [-1, 1] which is not the case if we check every doubles.

fluorinerocket1mo ago

You're right but still, big error. I get its averaging over the range, and the floats are not uniformly distributed.

Maybe the thing to optimize the expression for is the minimize the maximum error, and not the average error. I think that's what I would care about

hmpc1mo ago

Check the specification at the top. The range for x is [-1, 1]. For the range you provided the accuracy of the 0.5x alternative is reported as only 33%: https://herbie.uwplse.org/demo/570b973df0f1f4a78fe791858038a...

fluorinerocket1mo ago

You're right I misread the graph. That said though I have played around with Herbie before, trying it out on a few of the more gnarly expressions I had in my code (analytical partial derivatives if equations of motion if launch vehicle in rotating spherical frame) and didn't see much appreciable improvement over the expected range of values, but then again I didn't check every single one.

What would be cool is if you could some how have this kind of analysis done automatically for your whole program where it finds the needle in the haystack expression that can be improved, assuming you gave expected ranges for your variables

pavpanchekha1mo ago

Author here. I've got a few papers about this problem (including one in submission), but it is very very hard to do, especially with acceptable overhead. The state of the art is maybe 100x overhead.

bee_rider1mo ago

I wonder, is there a way to only request reformulations that don’t involve branches? The tool already seems quite nice, but that might be a good feature.

Also, I’m not sure I understand the speedup. Is it latency or throughput?

pavpanchekha1mo ago

Author here. The speed up is modeled throughput, though the model is relatively naive. It's possible to disable branches by turning off the regimes flag, see https://herbie.uwplse.org/doc/1.0/options.html

bee_rider1mo ago

Nice!

What’s uwplse mean? I mixed up the letters and misread it as ulp-wise which works for the project, haha.

pavpanchekha1mo ago

University of Washington Programming Languages and Software Engineering (research group).

I'm not at UW any more, I'm now at Utah, but some of the Herbie team is at UW and they provide the infrastructure

reikonomusha1mo ago

This is an awesome piece of software, one of my favorite little pieces of magic. Finding more precise or more stable floating point formulas is often arduous and requires a lot of familiarity with the behavior of floats. This finds good formulas completely automatically. Super useful for numerical computation.

shiandow1mo ago

I wonder who decided to use a step function for the speed accuracy plot. They must have thought the convex hull would be wrong because you can't really make linear combinations of algorithms (you could, but you'd have to use time not speed to make it linear). So I get why you would use step functions, but the step is the wrong way around. The current plot suggests accuracy doesn't drop if you need higher speeds

pavpanchekha1mo ago

It was me. Damn it you're right! Will fix!

fouronnes31mo ago

Working with tensor datatypes in numerical computing, I've been wondering if it would be possible to somehow add an extra dimension to tensors that would serve as the "floating point precision" dimension, instead of a data type. After all why couldn't the bit depth be one of the tensor dims? Maybe it would be possible to implement arbitrary floating point precision that way?

summarityOP1mo ago

This is somewhat in line with the approach taken by some softfloat libraries, e.g. https://bigfloat.org/architecture.html

fouronnes31mo ago

This is very interesting! I made an interval arithmetic calculator a few months ago [0]. I wonder if you could combine the two techs to make something even more powerful.

[0] https://victorpoughon.github.io/interval-calculator/

bhargavkk1mo ago

Herbie uses interval arithmetic a lot! The team has a in-house developed interval arithmetic library you can use at: https://crates.io/crates/rival3.

andrepd1mo ago

I don't quite understand how they define "accuracy".

perching_aix1mo ago

Could you substantiate that a bit more? I don't see what'd be hard to understand about it at a skim.

andrepd1mo ago

Is it an arithmetic average of relative error over the given range? Because if yes then it can be misleading, and potentially a bad meshes to rank alternatives (though the HTML report includes a graph over the input range, which is quite nice, so I'm talking only about the accuracy number).

In the limit, an alternative with 10x better accuracy when x>10^150 and 10x worse in 1<x<10^150 would rank higher :) but more generally, not all inputs are equally important.

Furthermore, floats have underflow to 0 and overflow to infinity, which screw all this up because it can lead to infinite relative error.

Because of this you have some of the funny cases reported elsewhere in this thread :p

I'm not sure what would be a better approach though. Weigh the scores with a normal distribution around 0? Around 1? Exponents around 0?

pavpanchekha1mo ago

Documented here but yes it's an average, of something similar to but not exactly the same as relative error: https://herbie.uwplse.org/doc/latest/error.html

It's true that averages can be misleading but we encourage users to think about it instead as a percentage of inputs. In practice the error distribution is very bimodal, the two modes being "basically fine" (a few ulps of error) and "garbage" (usually 0 instead of some actual value)

Archit3ch1mo ago

Thanks for reminding me to try this for audio approximations. :)

j / k navigate · click thread line to collapse

45 comments

summarityOP1mo ago

I posted this and it picked up steam over night, so I thought I'd add how I'm using it:

Generally there are multiple types of issues that can be flagged:

Archit3ch1mo ago

I don't know if there is an equivalent in Roslyn, but in Julia you can have the agent inspect the LLVM output to surface problems in hot loops.

yread1mo ago

Really cool, numerical stability can be tricky as errors can accumulate with each operation and suddenly 53 bits of precision is not enough.

Also nice to see an article thats not about AI or politics

urschrei1mo ago

I resurrected the Rust Herbie lint (now using dylint) a while ago: https://github.com/urschrei/herbie-lint

notpushkin1mo ago

Some trivial cases produce... interesting results.

For x in [−1.79e308, 1.79e308]:

Initial Program: 100.0% accurate, 1.0× speedup

  def code(x):
      return math.sqrt((x + 1.0))

Alternative 1: 67.5% accurate, 5.6× speedup

  def code(x):
      return 1.0

lifthrasiir1mo ago

andrepd1mo ago

AlotOfReading1mo ago

IEEE floats have a few warts like any other 1980s standard, but they're a fantastic design.

1 more reply

cocoto1mo ago

pavpanchekha1mo ago

We usually recommend looking for 90%+ accuracy or carefully examining the accuracy plot

FabHK1mo ago

Not really. 1+x/2, however, would be a good approximation to sqrt(1+x) for small (in absolute value) x.

irchans1mo ago

How about:

1 more reply

fluorinerocket1mo ago

Like looking at this example,

https://herbie.uwplse.org/demo/b070b371a661191752fe37ce0321c...

It is claimed that for the function f(x) =sqrt(x+1) -1

Accuracy is increased by from 8.5% accuracy to 98% for alternative 5 Which has f(x) = 0.5x

Ok so x=99, the right answer is sqrt(100) -1 = 9

But 0.5 * 99 = 49.5 which doesn't seem too accurate to me.

yossi_peti1mo ago

LiamPowell1mo ago

pavpanchekha1mo ago

Rexxar1mo ago

fluorinerocket1mo ago

You're right but still, big error. I get its averaging over the range, and the floats are not uniformly distributed.

Maybe the thing to optimize the expression for is the minimize the maximum error, and not the average error. I think that's what I would care about

hmpc1mo ago

fluorinerocket1mo ago

pavpanchekha1mo ago

Author here. I've got a few papers about this problem (including one in submission), but it is very very hard to do, especially with acceptable overhead. The state of the art is maybe 100x overhead.

bee_rider1mo ago

I wonder, is there a way to only request reformulations that don’t involve branches? The tool already seems quite nice, but that might be a good feature.

Also, I’m not sure I understand the speedup. Is it latency or throughput?

pavpanchekha1mo ago

bee_rider1mo ago

Nice!

What’s uwplse mean? I mixed up the letters and misread it as ulp-wise which works for the project, haha.

pavpanchekha1mo ago

University of Washington Programming Languages and Software Engineering (research group).

I'm not at UW any more, I'm now at Utah, but some of the Herbie team is at UW and they provide the infrastructure

reikonomusha1mo ago

shiandow1mo ago

pavpanchekha1mo ago

It was me. Damn it you're right! Will fix!

fouronnes31mo ago

summarityOP1mo ago

This is somewhat in line with the approach taken by some softfloat libraries, e.g. https://bigfloat.org/architecture.html

fouronnes31mo ago

This is very interesting! I made an interval arithmetic calculator a few months ago [0]. I wonder if you could combine the two techs to make something even more powerful.

[0] https://victorpoughon.github.io/interval-calculator/

bhargavkk1mo ago

Herbie uses interval arithmetic a lot! The team has a in-house developed interval arithmetic library you can use at: https://crates.io/crates/rival3.

andrepd1mo ago

I don't quite understand how they define "accuracy".

perching_aix1mo ago

Could you substantiate that a bit more? I don't see what'd be hard to understand about it at a skim.

andrepd1mo ago

In the limit, an alternative with 10x better accuracy when x>10^150 and 10x worse in 1<x<10^150 would rank higher :) but more generally, not all inputs are equally important.

Furthermore, floats have underflow to 0 and overflow to infinity, which screw all this up because it can lead to infinite relative error.

Because of this you have some of the funny cases reported elsewhere in this thread :p

I'm not sure what would be a better approach though. Weigh the scores with a normal distribution around 0? Around 1? Exponents around 0?

pavpanchekha1mo ago

Documented here but yes it's an average, of something similar to but not exactly the same as relative error: https://herbie.uwplse.org/doc/latest/error.html

Archit3ch1mo ago

Thanks for reminding me to try this for audio approximations. :)

j / k navigate · click thread line to collapse