Flaky: datatypes for real world quantities (opens in new tab)

(dimaborzov.com)

23 pointsknowledgesale13y ago40 comments

40 comments

36 comments · 11 top-level

blackhole13y ago· 8 in thread

This is a terrible idea. This is a catastrophically bad idea. How do you compare numbers? How do you figure out that 2.2 is less than exp(65)? You have to represent these as numbers at some point in the calculation, and in order to do that, you're probably going to be using either floating point or fixed point, which means it's still trivially possible to construct an error case that suffers the exact same problems normal floating point numbers have. Observe:

    x=1
    for(i=0;i<n;++i)
      x=0.01+sqrt(x)

You can't simplify this, so it will simply loop until it hits the datatype boundary, and then get rounded into oblivion, because the underlying floating point representation will break in the exact same way. The only way to get rid of this would be to use an arbitrary number of bits for precision, in which case... just use an arbitrary precision library instead! This is EXACTLY WHY WE HAVE THEM.

Most equations that aren't trivial cannot be simplified. This datatype would only be useful for spoon-fed high school math problems. Furthermore, it costs so much CPU to perform you might as well just use an arbitrary precision library, which will probably be faster and be just as effective at preventing edge cases.

https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

ealloc13y ago

At some point you _do_ have to use floating point (the whole point of numeric computations is to do things you can't do symbolically), however, this project might enable judicous use of symbolic manipulation in coordination with floating point. Holding off the computation until further in the program might have advantages: Automatic reordering of computations to help preserve precision, automatic symbolic manipulation to cancel out large terms, and perhaps automatic calculation of the amount of precision needed (eg if calculating "a < b", generate the digits of a and b until they differ). The programmer could specify exactly when symbolic manipulation is desired vs numeric.

The quadratic equation case he shows is pretty good. Say you want to solve ax^2 + bx + c = 0. In C you might write "(-b + sqrt(b * b - 4 * a * c))/(2 * a)" which has all the precision problems he described. However, in his library that code would be converted to an AST, then symbolically simplified given the variable values, and _then_ evaluted numerically. The intermediate symbolic step would be able to automatically account for the edge cases, eg if c was 0, so that the numeric step wouldn't fail. In this case, the symbolic solver would recognize that (+, (sqrt, (-, ( *, b, b), 0)), b) is exactly zero before the floating point computation occurs.

I often write scientific programs with the issues he describes, and I have also wished for something along these lines but kind of suspect it is impossible. I usually have to play in mathematica/on paper a bit before finding an algorithm that takes care of the numeric issues (yes, even when using multiprecision), and while it would be nice for it to happen automatically, my impression is that the algorithms I find on paper are nontrivial for a computer to come up with. In some cases, however, the computer could have done it for me. But I have a feeling that in the case precision becomes an issue, even if this library gets me 90% of the way, I would still want total manual control of how the calculations occur. But I still want him to try in case it works!

blackhole13y ago

Hey, if he wants to make a symbolic algebra system, that's great! But don't go advertising it as a replacement for floating point, because it isn't.

stiff13y ago

Actually, it's a pretty interesting idea. He basically wants to keep track of the last n operations with every given number and take advantage of this history to try to minimize the floating point error. To compare 2.2 with exp(6.5) or anything else you just evaluate the expression, but taking advantage of possible reorderings or simplifications.

Your critique really boils down to:

Most equations that aren't trivial cannot be simplified.

but this is not true, just take a look at what Wolfram Alpha or Mathematica can do.

darkmighty13y ago

There are also numbers that can't _possibly_ be represented by this system using the elementary functions/operators, i.e. are "non-analytic".

This may actually not be a problem in most instances, however. It really depends on the use case, but outside of scientific computing/engineering design we're not usually doing computations to find roots of degree >= 5 polynomials.

OTOH, as pointed out by GP, it is easy to construct 'unsimplifiable' operations which after some time may grow too large.

My conclusions are, this certainly cannot be implemented on practical systems without some care -- it's not as straightforward as it looks at first -- and certainly not a better default as is right now.

blackhole13y ago

If you have an equation in your program that can be effectively simplified, you should have simplified it before writing in the first place. The thing about equations that can be simplified is that 99.9% of the time, they can be simplified at compile time.

If you want to write a library to do it for you, great! But don't advertise it as a replacement for floating point number representations, because it isn't.

2 more replies

betterunix13y ago

"You can't simplify this"

I have learned to be careful about making such claims.

http://mathworld.wolfram.com/NestedRadical.html

darkmighty13y ago

Yea but simplifications like this are probably non-computable, even when possible, and hard to do in any case.

blackhole13y ago

Wrong, that isn't what this is. This isn't a nested radical to infinite terms, it's a nested radical to n terms. Thus, it cannot be simplified.

smilekzs13y ago· 5 in thread

What we need is an open-source alternative of Mathematica's core language.

reeses13y ago

Common Lisp has a robust and comprehensive numerics facility. Poking around the CLHS and reading about ratios, fixnums, bignums, etc., will lead you to the conclusion that you can get your numeric work done (albeit in many cases much more slowly than you would like).

Add one of the pattern matching and term rewriting packages and you'll be very close to the foundation of the core of Mathematica. It will be a SMOP and adding to the library of facts enabling recognition of reducible terms.

ealloc13y ago

There are a number of these. The one I have been using lately is sympy - a pure python symbolic math library.

You may also be interested in the open source "sage mathematics", which is a frontend/interface to many different open-source symbolic solvers such as sympy, maxima, GAP, PARI, Singular, and octave.

They are not as good as mathematica currently, but can already do quite a lot.

archgoon13y ago

http://stackoverflow.com/questions/33550/best-open-source-ma...

1) Mathics 2) SAGE 3) Sympy 4) Maxima

GhotiFish13y ago

well, off the top of my head, there's GNU Octave.

riffraff13y ago

that'd be matlab not mathematica

eru13y ago· 4 in thread

> Floating point datatype is the de-facto standard for real world quantities. Whether it is a banking account [...]

Is there any bank which uses floating point for accounting?

rwallace13y ago

I encountered an insurance company that used floating point as their standard data type for monetary values. On the bright side, that was in the late nineties; hopefully they've switched to a proper decimal type since then.

reeses13y ago

A huge number of ecommerce sites do. (Such as almost every site based on ATG/Oracle Commerce)

Most banking systems predate standardized FP and use fixed decimal representations.

eru13y ago

I can see them being used by the quants for fast pricing. But really, for accounting? Where do the missing cents go?

berkut13y ago

I certainly hope not.

GhotiFish13y ago· 4 in thread

floating point will never be obsolete, it is a log scale datatype, and log scale datatypes represent most natural values perfectly.

The only place where the shoe doesn't fit is where you need a minimum accuracy. In that case what you should be doing is using integers to represent your minimum quantifiable unit.

For example, you could represent currency in millicents to give you respectably accurate rounding. Not accurate enough? Microcents then. Now you don't have enough range? Good, you're thinking about your requirements now, floating point DEFINITELY wouldn't of worked.

qznc13y ago

In addition you must prevent overflows.

Microcents with signed long long (64bit) means a maximum value of 263 / 100 / (106) = 92233720368. In words "92 billion", which might not be enough.

reeses13y ago

In environments with exhaustive partitions and automatic 'promotion' (i.e., Common Lisp but probably not Ruby, etc.), your overflow would result in a correctly calculated bignum.

GhotiFish13y ago

yup! Which is scary to think about, because if you had a microcents requirement for accuracy, your ceiling would of being at 2^53/2*(micro) = 4,503,599,627

4 billion. eak.

reeses13y ago

Integers, BCD, rationals, or other arbitrary precision representations are preferable as well. IBM keeps BCD alive and fast for COBOL and other 'legacy' systems.

shoo13y ago· 2 in thread

related: there is a short series of exercises in SICP that explore the idea of building an interval arithmetic library. i.e. numerical values are represented by intervals [a, b] which encode their uncertainty/error:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-14.html...

Exercise 2.14 and onwards point out that two expressions that are algebraically equivalent for perfectly accurate values stop being equivalent once we introduce uncertainty. This is only for a toy example expression with 2 variables. Suppose we want to solve a linear system of equations in 100s of thousands of variables. Is it tractable to track the uncertainty for all of those variables at once? Will their uncertainties be dependent? ...

ronaldx13y ago

aye, this matches my intuition: having a lower bound and an upper bound (as rational numbers) on each value means you can:

- go back and improve if necessary, or otherwise - express the uncertainty of a comparison

thanks for the link :)

GhotiFish13y ago

... that is a really good idea. Why do I keep putting off reading that book?!

oofabz13y ago· 1 in thread

How about continued fractions? Unlike floating point, they can represent any rational number in finite memory. Using generators, you can even represent many irrational numbers, like square roots and pi, in finite memory.

Richard Schroeppel and Bill Gosper explored continued fractions in the '70s:

http://www.inwap.com/pdp10/hbaker/hakmem/cf.html

darkmighty13y ago

What's special about representing rationals in finite memory? By definition they can be represented by two finite integers. There are many ways of describing irrational numbers in finite memory if you allow this kind of generalization, e.g. make a datatype to represent equations and numbers as solution to the equations.

jamesaguilar13y ago· 1 in thread

Maybe it's just me, but saying that something in current use in billions of computers around the globe is . . . somewhat of a stretch of the word "obsolete."

eksith13y ago

In this case "obsolete" may be the relevant term as in Automobile coach builders, meet the Buggy Whip makers. The two did co-exist in the early 1900's so the alternative, "outmoded" didn't happen overnight. But it did eventually happen.

The numbers, in the billions, of course weren't there, but my guess is that the proportions are applicable.

Edit: After reading the article, I strongly disagree this approach is "better" in any sense of the word. More along the lines of using whale oil for the automobile instead of gasoline. Gasoline isn't perfect, but it works. Until we have something better (I.E. Electric), I'll stick to that.

riffraff13y ago

Possibly of interest: the Frink programming language has arbitrary precision math and unit of measurement tracking, and has been around for more than a decade.

http://futureboy.us/frinkdocs/index.html

darkmighty13y ago

Interesting take on floating point problem, but it seems as if writer isn't well versed on the centuries old solution of this problem, namely refinement calculations for lin algebra and more general iterated numerical methods for nonlinear systems -- and those are the places where the precision matters, where you are trying to calculate a figure with a given accuracy.

Note however, that solutions to many problems may in a sense 'non-analytic', there may be no finite set of elementary functions on a given rational number which yields the solution.

Also, iterative answers are usually the only viable way to reach solutions, they're usually much faster than the exact solution (or the floating point precision limited solution), and you can always control how good your solution is.

Observation: So in a sense what is practically used may indeed very close to the Kolmogorov complexity of the solutions - the representation as R=IterativeProblemSolve(Problem,ClosestSolution), where we publish the problem and the desired solution! (assumig we are efficiently describing the problem)

Noughmad13y ago

Floating point is not so bad. Yes, when you need fixed precision, such as in accounting, you should avoid it. But with numerical computations, if you run into problem because of float limitations, you're doing it wrong.

The first rule of numerics is not to compare variables of very different magnitude. His first example has coefficients that differ by almost 200 orders. This example is totally outrageous, but still, what any reasonable person would do is introduce some scaling into the equation.

Yes, you have to think about scales, but you already do that. You never write programs with meters and seconds, you choose scaled dimensionless quantities. And unless you choose the scales very badly, you won't get any problems from floats.

weichi13y ago

Hard to comment without seeing more concrete examples of his approach. But it makes me wonder whether processors are now so fast that, for a large class of numerical problems, the default should be to give up calculational speed in return for eliminating some of the trickiness involved with floating point.

j / k navigate · click thread line to collapse

40 comments

36 comments · 11 top-level

blackhole13y ago· 8 in thread

    x=1
    for(i=0;i<n;++i)
      x=0.01+sqrt(x)

https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

ealloc13y ago

blackhole13y ago

Hey, if he wants to make a symbolic algebra system, that's great! But don't go advertising it as a replacement for floating point, because it isn't.

stiff13y ago

Your critique really boils down to:

Most equations that aren't trivial cannot be simplified.

but this is not true, just take a look at what Wolfram Alpha or Mathematica can do.

darkmighty13y ago

There are also numbers that can't _possibly_ be represented by this system using the elementary functions/operators, i.e. are "non-analytic".

OTOH, as pointed out by GP, it is easy to construct 'unsimplifiable' operations which after some time may grow too large.

blackhole13y ago

If you want to write a library to do it for you, great! But don't advertise it as a replacement for floating point number representations, because it isn't.

2 more replies

betterunix13y ago

"You can't simplify this"

I have learned to be careful about making such claims.

http://mathworld.wolfram.com/NestedRadical.html

darkmighty13y ago

Yea but simplifications like this are probably non-computable, even when possible, and hard to do in any case.

blackhole13y ago

Wrong, that isn't what this is. This isn't a nested radical to infinite terms, it's a nested radical to n terms. Thus, it cannot be simplified.

smilekzs13y ago· 5 in thread

What we need is an open-source alternative of Mathematica's core language.

reeses13y ago

ealloc13y ago

There are a number of these. The one I have been using lately is sympy - a pure python symbolic math library.

You may also be interested in the open source "sage mathematics", which is a frontend/interface to many different open-source symbolic solvers such as sympy, maxima, GAP, PARI, Singular, and octave.

They are not as good as mathematica currently, but can already do quite a lot.

archgoon13y ago

http://stackoverflow.com/questions/33550/best-open-source-ma...

1) Mathics 2) SAGE 3) Sympy 4) Maxima

GhotiFish13y ago

well, off the top of my head, there's GNU Octave.

riffraff13y ago

that'd be matlab not mathematica

eru13y ago· 4 in thread

> Floating point datatype is the de-facto standard for real world quantities. Whether it is a banking account [...]

Is there any bank which uses floating point for accounting?

rwallace13y ago

reeses13y ago

A huge number of ecommerce sites do. (Such as almost every site based on ATG/Oracle Commerce)

Most banking systems predate standardized FP and use fixed decimal representations.

eru13y ago

I can see them being used by the quants for fast pricing. But really, for accounting? Where do the missing cents go?

berkut13y ago

I certainly hope not.

GhotiFish13y ago· 4 in thread

floating point will never be obsolete, it is a log scale datatype, and log scale datatypes represent most natural values perfectly.

The only place where the shoe doesn't fit is where you need a minimum accuracy. In that case what you should be doing is using integers to represent your minimum quantifiable unit.

qznc13y ago

In addition you must prevent overflows.

Microcents with signed long long (64bit) means a maximum value of 263 / 100 / (106) = 92233720368. In words "92 billion", which might not be enough.

reeses13y ago

In environments with exhaustive partitions and automatic 'promotion' (i.e., Common Lisp but probably not Ruby, etc.), your overflow would result in a correctly calculated bignum.

GhotiFish13y ago

yup! Which is scary to think about, because if you had a microcents requirement for accuracy, your ceiling would of being at 2^53/2*(micro) = 4,503,599,627

4 billion. eak.

reeses13y ago

Integers, BCD, rationals, or other arbitrary precision representations are preferable as well. IBM keeps BCD alive and fast for COBOL and other 'legacy' systems.

shoo13y ago· 2 in thread

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-14.html...

ronaldx13y ago

aye, this matches my intuition: having a lower bound and an upper bound (as rational numbers) on each value means you can:

- go back and improve if necessary, or otherwise - express the uncertainty of a comparison

thanks for the link :)

GhotiFish13y ago

... that is a really good idea. Why do I keep putting off reading that book?!

oofabz13y ago· 1 in thread

Richard Schroeppel and Bill Gosper explored continued fractions in the '70s:

http://www.inwap.com/pdp10/hbaker/hakmem/cf.html

darkmighty13y ago

jamesaguilar13y ago· 1 in thread

Maybe it's just me, but saying that something in current use in billions of computers around the globe is . . . somewhat of a stretch of the word "obsolete."

eksith13y ago

The numbers, in the billions, of course weren't there, but my guess is that the proportions are applicable.

riffraff13y ago

Possibly of interest: the Frink programming language has arbitrary precision math and unit of measurement tracking, and has been around for more than a decade.

http://futureboy.us/frinkdocs/index.html

darkmighty13y ago

Note however, that solutions to many problems may in a sense 'non-analytic', there may be no finite set of elementary functions on a given rational number which yields the solution.

Noughmad13y ago

weichi13y ago

j / k navigate · click thread line to collapse