Floating Point Visually Explained (2017) (opens in new tab)

(fabiensanglard.net)

423 pointsbmease5y ago96 comments

96 comments

For another visual explanation in words, floating point numbers (ignoring subnormal) are just a linear approximation of 2^x [1] where there is one piece for each integer (x = 4 to x = 5, etc). As an example, draw a straight line between 2^4 (16) and 2^5 (32). The floating point numbers in that range are evenly spaced on that line.

Another explanation using the window + offset terminology used in the post is that the offset is a percentage of the way through the window. So, for a window of 2^x, the difference between an offset of y and y + 1 is 2^(x-23) or 1/2^(-23) of 2^x. Put another way, floating point numbers do not have absolute error like integers (each number is within 1 of a representable value), but % error (each number is within 1/2^(-23) of a representable value). Essentially, floating point numbers use % error bars instead of absolute error bars.

Using this model you can even see how to create your own floating point numbers. Just pick a % precision you want, for single FP that is 1/2^(-23) and double FP 1/2^(-52), that defines the range of your mantissa (offset). Then pick a range of x values you want to represent, that is the range of your exponent (window).

As an aside, subnormal numbers do not respect this principle. They extend the expressible range for very small numbers by sacrificing % error for those numbers. In the very worst case of the smallest subnormal number you can get 25% error (it might actually be 50%). As might be imagined, this plays havoc on error propagation since if you ever multiply by a number that just so happens to be the smallest subnormal, all your multiplies might suddenly be off by a factor of 25% instead of the normal 100 * 2^(-23)% which is 2,000,000 times the % error which is quite a bit harder to compensate for. This is why many people consider subnormals to be a blemish.

[1] The approximation is actually offset in the x direction for the bias. If you want to be more accurate, you are actually graphing 2^(x - 127).

titzer5y ago

On the other hand, if you don't have subnormals, then you have the funny property that subtracting two inequal numbers would yield 0. This never happens with subnormals because they allow representing numbers closer to zero, below the smallest representable exponent. The numerical stability of some algorithms crucially relies on subnormals.

Veserv5y ago

I do not understand your statement. I can come up with two possible interpretations, but neither seem true to me. Can you provide an example of what you mean?

I am also interested in your statement that certain algorithms depend on subnormals as defined in the floating point spec. Can you provide an example of such an algorithm? I can intuit how it might be desirable to have a single "too small/epsilon" value, but I do not see offhand how you can leverage the full range of subnormals in any reasonably generic way that is not extremely dependent on the specific problem and scale (i.e. multiply all numbers by 2^30 and increase the maximum exponent by 30, do you get the same output multiplied by 2^30), so I would like to see how it is done.

In terms of my two possible interpretations, the first is that subtracting two unequal floating point numbers yield 0. I am pretty sure this is not the case. It may yield no change, but I am pretty sure it can not yield 0. The other is that subtracting two unequal arbitrary precision numbers represented as floating point numbers yield 0. This is true, but is a known limitation of emulating arbitrary precision arithmetic using limited precision and must always be accounted for. If this is what you meant, then we can just choose numbers too small to be expressed by subnormals to cause the problem to occur again, so all it does is allow handling of a few more cases at the cost of complexity and non-uniformity. If you did not mean either of these two interpretations, can you explain what you meant, preferably with a concrete example?

1 more reply

commandlinefan5y ago

Wow, that's a much easier way to convert from decimal to floating point than I had ever seen. He doesn't mention why biased notation is used (i.e. why the exponent is stored as 127+E): it's used so that if you sort positive numbers as if they were integers, they'll still end up in the right order.

bogomipz5y ago

>"He doesn't mention why biased notation is used (i.e. why the exponent is stored as 127+E): it's used so that if you sort positive numbers as if they were integers, they'll still end up in the right order."

Could you elaborate on this? Maybe an example? This sounds interesting but I'm failing to grasp it.

Someone5y ago

https://wiki.c2.com/?IeeeSevenFiftyFour:

“IEEE 754 […] Has the interesting and useful property that two's complement comparisons of the underlying bit pattern of any two IEEE 754 numbers will have the same result as comparing the numbers that are represented”

That means that, if you interpret the bits of a float/double as an int32/int64, increase that integer by one, and then interpret the bits of the result as a float/double, you get the smallest float/double that’s larger than what you started with (with exceptions for NaN, infinities, +/- zero, and, possibly, some categories I forget)

That can be useful if you want to iterate over all floats between 0.0 and 1.0, for example, but may not be that efficient on some modern hardware, where moving data from the “float side” of the CPU to the “integer side” is expensive)

2 more replies

saagarjha5y ago

The first thing to notice is that a (not subnormal) floating point number can be compared as such: you look at the exponent bits, and pick the one that has the greater exponent. If they are equal, then look at the mantissa, which is always between 1 and 2, and pick whichever one is larger. Since the exponent comes before the mantissa bits in the bit representation of a floating point number, it has "higher significance" when interpreted in fixed point. If we used two's complement to represent the exponent negative exponents would "look larger" than positive ones, despite them being smaller. But if we used a biased representation, where we shift the range so that there can only be a positive number in that field, and the smaller negative exponents can actually be small and the positive exponents are be larger than those so we can compare them directly.

1 more reply

BeetleB5y ago

The exponent is 8 bits. That translates to 0 through 255. This means your "window" begins from [1,2], [2,4], etc.

That sucks. It means I cannot represent any number between 0 and 1.

Ideally, we want the exponent to be somewhat symmetric, which in this case means going from -127 to 127. To translate 0 to 255 to that interval, you subtract 127. You are simply shifting your exponent so that the allowed values for your exponent is symmetric about 0.

Now your window starts with [2^-127, 2^-126] and so on.

(I may have an off by one error here).

commandlinefan5y ago

Sure, I had to wrestle with this a bit myself: to make it easier, imagine a 16-bit floating point format (P&H call this the "Nvidia format", but I can't find that documented anywhere but there): 1-bit sign, 5-bit (biased) exponent, 10-bit mantissa. One thing that TFA leaves out about floating point mantissas is that there's an implicit leading 1, so a 10-bit mantissa of 1111100000 would be interpreted as (binary) 1.1111100000 or 1 + 2^-1 + 2^-2 + 2^-3 + 2^-4 + 2^-5 = 1.96875. So, take the mantissa, convert it to a fraction, add 1, and then raise it to the power of the (biased) exponent.

So now, take the 16-bit pattern (0000 0011 1110 0000); you get a 0 sign bit, an exponent of 0, and a mantissa of 1.96875. So, with a bias of -15, that's 1.96875 * 2 ^ 0-15 = 0.00006008148193359375.

As you creep up to the "next" exponent, you see that the boundaries are respected. The last 2^-15 number is 0000 0011 1111 1111 (0x3ff) and the first 2^-14 number is 0000 0100 0000 0000 (0x400); now the exponent has changed from 0 to 1, but the floating point converts to 1.999023 * 2 ^ -15 = 0.0000610053539276123046875 and 1.000000 * 2 ^ -14 = 0.00006103515625: a bit-by-bit comparison has 0x3ff < 0x400. (This would be true regardless of bias).

Now imagine that IEEE 754 stored exponents in two's-complement format instead; the exponent 01111 would be interpreted as +31, but the "next" exponent, bit-wise, would be 10000 = -32. This means that you'd end up with 0011111111111111 = 1.999023 * 2 ^ 31 = 4292869204.475904, but the next binary number, 0100000000000000 would be 1.0 * 2 -32 = 0.00000000023283.

1 more reply

bkanber5y ago

Well, really the reason for the bias is so that you can get a negative exponent. If E = 10, then the exponent will be 2^-117. You might still need a negative exponent with a positive sign. The offset is used rather than two's complement maybe because of that sorting thing but also because of the math itself, when doing operations on two FP numbers.

edflsafoiewq5y ago

IOW the "windows" are just labeled in order from left to right, starting with 0 for the leftmost window.

SupriseAnxiety5y ago

This right here is the only way I pretty much figured out just now a more precise understanding of the weird patterns going on in my head. I crave precision and this explained more than a whole year covering Basic C. Thank you so much! My head doesn’t have to hurt over that anymore. I never understood floats either but this representation truly helps clear some fog!!

GolDDranks5y ago

Another visualization that blew my mind; I learnt this from a presentation that was about doing floats with FPGAs:

Think 32-bit floats as a 256-bit buffer interpreted as a fixed precision number, with the decimal point straight in the middle, but with the limitation that you can set only a continuous 24-bit window on that buffer to non-zeros.

Then, the 8 bits of the exponent determine where in the bit buffer that window points to, and the 23 (+1) bits are the contents of the window.

saagarjha5y ago

Related, https://float.exposed/ is a great resource: both when trying to see how floating point is laid out, as well as when having to convert between the bit representation and the number for "actual work" ;)

bkanber5y ago

That is great. Also helps remind you that for very large numbers, the precision is in the hundreds-of-thousands.

0-_-05y ago

Fabien Sargland is the same guy who wrote 2 books on a deep dive into 2 game engines: Wofenstein 3D and Doom which are great reading and I really recommend them to the HN crowd (can be downloaded for free):

https://fabiensanglard.net/gebbwolf3d/

https://fabiensanglard.net/gebbdoom/

wernsey5y ago

He also recently had a series of articles about the various ports of Out of this World/Another World to various systems which are well worth reading if you're into this sort of thing

totorovirus5y ago

this looks really interesting! thanks for sharing :)

zoomablemind5y ago

It'd be nice to also mention about ulp [0], the unit of least precision. The floating point is an approximation as concept, and ulp is one of the properties of an implementation of floating point representation in binary form.

0: https://stackoverflow.com/questions/43965347/ulp-unit-of-lea...

splittingTimes5y ago

That's why when I did numerical simulation of electron Dynamics in semiconductors during my phD we never used straight SI units (m, s, kg, etc), but instead expressed all physical natural constants in nm, fs, eV, etc. That way all relevant constants had numerical values between 1 and 10 which stabilized the simulations a lot.

petschge5y ago

Expressing all quantities as small multiples of suitable reference quantities is quite common. The resulting equations after simplifications are often referred to as dimensionless (because the reference quantity has absorbed the units and only a small dimensionless number remains).

There is two upsides to this: 1.) All quantities are order one, safely away from underflow and infinity. 2.) The equations are generally simpler, which was important when the many limitation in simulation codes was flops, not memory bandwidth.

There are also important downsides: 1.) You have to manipulate the equations before you start coding. Porting features from a code that used another normalization is quite error-prone. 2.) All quantities are order one, which makes it hard to detect if you accidentally use and electric field instead of a magnetic field. All you see is a number of roughly the correct magnitude. 3.) Comparison between codes is more difficult, in particular if the use different normalization, because you have to convert from "code units" to actual SI units.

mschuetz5y ago

File formats for georeferenced data also regularly store coordinates as 32 bit ints, with a scaling argument that can be used to convert the integer into a double precision number with a specific unit. E.g. your scene unit is meters and you want to store with a precision of millimeters? Multiply double precision coordinates by 1000, convert the result to an int32, save it to file. When you load the coordinate afterwards, divide it by 1000 and you you get back your double precision coordinate.

swimfar5y ago

I had never heard of this idea before. Do you have any references to this technique? Also, depending on what equations you're using, aren't you constrained to using a consistent system of units?

splittingTimes5y ago

Never anything official. It was just how it was done in the group and wider scientific community. Some googling pulled that up if it helps

https://books.google.de/books?id=uzJbyD6_3DAC&pg=PA4

And yes you have to be consistent and stay in that unit system. The other remaining two we used plain Kelvin for temperature and for the charge Coulomb was measures in electron charge. That's it.

1 more reply

bchociej5y ago

Is all of that really easier to understand than exponential notation? It's a great tool to visualize floating point precision, but it's lot more circuitious to get to an understanding of what a floating point number actually means IMO

bkanber5y ago

The equation fleshes out the detail that the visualization misses. I do best when I have both.

seanalltogether5y ago

I always found the wikipedia examples for 16bit floating point helpful since the numbers are smaller. You can really see how the exponent and fraction affect each other in a very simple way.

https://en.wikipedia.org/wiki/Half-precision_floating-point_...

gopalv5y ago

BFloat16[1] is an interesting tweak to the original, but built with more modern requirements for ML in mind.

Particularly because it is easier to think of 32 bit IEEE-754 to BFloat16, but with fewer bits of think about (& possibly you can enumerate the entire range in a laptop to test something like "will this function work for all values?").

[1] - https://en.wikipedia.org/wiki/Bfloat16_floating-point_format...

enriquto5y ago

The 8-bit floating point is even easier to understand, since you can list them all in a single page, and even visualize the entire addition and multiplication tables.

piadodjanho5y ago

Also a great format to count how many representation are wasted with redundant representation (ZeroS, NaN, +inf, -inf).

1 more reply

dylan6045y ago

"(−1)S∗1.M∗2(E−127) How everybody hates floating point to be explained to them."

Now I know I'm weird, as that formula makes sense to me.

dntbnmpls5y ago

The formula makes sense, but it doesn't give you the full picture of floating points. The formula doesn't explain NaN, positive infinity, negative infinity or even the bias. I think seeing the layout of a floating point number and their constituent parts in memory is more instructive than a formula.

Floating points, like two's complement, are hardware derived/limited creations. You have to understand the underlying hardware and binary limitations to understand why they exist and why they were created in that manner. The same thing applies to ascii. Zero is 48, A is 65 and a is 97. The mapping seems arbitrary unless you see the bit pattern and you realize how clever the design is and the reason why some of the symbols got their mappings in ascii.

rwallace5y ago

Excellent article! Just one thing surprised me:

> While I was writing a book about Wolfenstein 3D[1], I wanted to vividly demonstrate how much of a handicap it was to work without floating point

I would've expected fixed point to work fine for games, because that's a domain where you know the data, and in particular the dynamic range, in advance, so the 'automatically adjust to whatever dynamic range happens to be in the data' feature of floating point isn't needed. What am I missing? (If the answer is 'it would take too long to explain here, but he does actually explain it in the book', I'm prepared to accept that.)

1 more reply

jonplackett5y ago

Whenever I read something about how computers _really_ work (as in not just a nice easy to comprehend programming language) I realise just how much smarter some people in the world are than me.

bigdict5y ago

With every thing like that that you read, you get closer to them!

adrianmonk5y ago

Here's how I like to think of it.

Floating point numbers are just fractions but with one extra condition: the denominator is a power of two.

The normal rules of fractions apply. If you want to add them, you have to make sure the denominators match, which would involve scaling the numerators too.

Just like fractions, there are multiple ways of writing the same value. 3/2 is the same as 6/4.

You can't write 1/3 exactly because, hey look at your denominator, it's 3. Which isn't a power of 2, is it? So that can't be a floating point value.

cambalache5y ago

> Just like fractions, there are multiple ways of writing the same value. 3/2 is the same as 6/4.

This is not correct. In binary you only have 2 choices {0,1} for the numerator, so 3/2 is represented 1.1 and 6/4 can only be represented as 1.1 (feel free to add the completing 0s)

adrianmonk5y ago

The numerator is a binary integer, not a bit (and not the part of the number before the "decimal" point).

3 in binary is 11. 6 in binary is 110.

3/2 (decimal) is 11/10 (binary) and 6/4 (decimal) is 110/100 (binary).

Of course the denominators always have trailing zeros, so can be (and are) represented more compactly.

1 more reply

zozbot2345y ago

I believe that this "window and offset" intuition, while indeed true and useful in the radix-2 ("binary") case, does not cleanly extend to the general case where no hidden bit is used even for non-"subnormal" numbers, and some numbers may thus have multiple representations. This shows up perhaps most clearly in the case of decimal floating point, but ISTR that a non-2 radix was also used in some mainframes.

saagarjha5y ago

Subnormal numbers are a blemish on the otherwise beautiful IEEE 754 specification :(

biddlesby5y ago

I saw a very simple visualisation of floating point by simply plotting points along the number line and zooming out. That gets the idea across very quickly!

wenc5y ago

Yes, I came across something like that in my numerical computation course. This was for me the most compelling visualization of what a floating point number was.

The floating number line is actually an (unevenly spaced in reals, uniformly spaced in binary) discrete number line that was used to approximate continuous infinite real numbers, in finite precision e.g.

Reals

0 __________________________________________________ 1

Floats

0 .. .... .. ... .. ... ... ... .. ... ... ... .. .... ... ... ... . ... .... ... .. . ... 1

saagarjha5y ago

A more accurate representation might be

  0.... . . . .  .  .  .  .    .    .    .    1

(Hacker News eats consecutive spaces, it seems like :/)

1 more reply

cmrdporcupine5y ago

This is a bit odd:

"Trivia: People who really wanted a hardware floating point unit in 1991 could buy one. The only people who could possibly want one back then would have been scientists (as per Intel understanding of the market). They were marketed as "Math CoProcessor". "

The 486DX (1989) was already common in 91/92 and came with a floating point unit. I had a 50mhz 486DX, and I was not by any means wealthy. FP unit was certainly used by lots of software, especially things like Excel, but C compilers certainly produced code for it if you had one.

Likewise, the 68040 (1990) had onboard FP. Macintosh Quadra, Amiga 4000, and various NeXT models had it.

Yes if you bought a 386 you often had to get a floating point co-processor as upgrade, but it wasn't _that_ uncommon. Same on the 68k series; I knew people with Atari MegaSTes (68000) that bought an FP co-processor. They weren't astronomers :-)

This feels like recent history, am I really that old?

jscheel5y ago

I remember wanting a math co-processor so I could use autocad at home as a kid.

fabiensanglard5y ago

And now you could probably buy an expensive Tesla V workstation dedicated to that but you don't have time anymore. Wait. No. That is me :P !

baby5y ago

Interesting, because I think about it the other way: the offset tells you between which power of two you're in:

* [2^-127, 2^-126] * ... * [1, 2] * [2, 2^2] * [2^2, 2^3] * ... * [2^128, 2^129]

and the window tells you where you are in this "window". You have 8 bits to figure out where you are in there, so it's obvious that you have much more precision if you are in the window [1, 2] than in the window [2^128, 2^129]

jiveturkey5y ago

https://opencores.org/projects/fpuvhdl

dang5y ago

A thread from 2019: https://news.ycombinator.com/item?id=19084773

Discussed at the time: https://news.ycombinator.com/item?id=15359574

(Reposts are ok after a year or so.)

boomlinde5y ago

I wonder why the significand is represented as a > 1 number. It complicates representing zero (making it a special case not covered by the article of defining exponent=0, mantissa=0 to mean zero). Is it for the purpose of simplifying arithmetic operations or is it to minimize the number of redundant representations of zero?

rational_indian5y ago

The leading 1. prefix can be omitted from the binary representation thus saving a bit in the process.

boomlinde5y ago

That doesn't answer my question, though. I'm asking why the significand is in the range 1-2 instead of 0-1.

jimmaswell5y ago

I remember wondering at first how you know this won't have multiple representations and cover every number - it's because the mantissa can't reach 2, and if it was 2 it would be the same as adding to the exponent, so you get the full range between any two exponents.

piadodjanho5y ago

Ackchyually...

The IEEE-754 has a lot of redundant representation. Not where you would expect though.

Caveat: Those features are invaluable for some niche applications, but not for the average joe.

To start. Every IEEE-754 float has two zero representation: one for positive zero and another negative negative zero (sic).

The special numbers are another source of redundancy. The the double format, have about 9,007,199,254,740,992 different combination to encode three different states that a production ready software shouldn't reach: NaN, +inf and -inf.

Other than the redundancy, the double have many rarely used combination. For instance, the subnormals representation. Unless you are compiling your program with -O0 or with some exotic compiler, they are disabled by default. One subnormal operation can take over a hundread of cycles to complete. Therefore, more 9,007,199,254,740,992 wasted combination.

If that wasn't bad enough, since the magnitude of the numbers follows a normal distribution (someone whose name I forgot's law), the most significant bits of the exponent field are very rarely used. The IEEE-754 encoding is suboptimal.

The posit floating point address all those issues. It uses an tapered encoding for the exponent field.

thechao5y ago

I'm mixed on Gustafson's posit stuff. For me, the only thing I'd change for fp would be:

1. -0 now encodes NAN.

2. +inf/-inf are all Fs with sign: 0x7FFFFFFF, 0xFFFFFFFF.

3. 0 is the only denorm.

Which does four good things:

1. Gets rid of the utter insanity which is -0.

2. Gets rid of all the redundant NANs.

3. Makes INF "look like" INF.

4. Gets rid of "hard" mixed denorm/norm math.

And one seriously bad thing:

1. Lose a bunch of underflow values in the denorm range.

However, as to the latter: who the fuck cares! Getting down to that range using anything other than divide-by-two completely trashes the error rate anyways, so why bother?

The rest of Gustafson's stuff always sounds like crazy-people talk, to me.

2 more replies

BeetleB5y ago

> If that wasn't bad enough, since the magnitude of the numbers follows a normal distribution (someone whose name I forgot's law), the most significant bits of the exponent field are very rarely used. The IEEE-754 encoding is suboptimal.

But isn't that accounted for by the fact the floating point number distribution is non-uniform? Half of all floating point numbers are between -1 and 1.

1 more reply

jimbob455y ago

This guy's Quake 1/2/3 source code deep dives are well worth checking out as well.

barbs5y ago

Awesome. Maybe now I can finally understand the fast inverse square root hack.

https://en.wikipedia.org/wiki/Fast_inverse_square_root

voz_5y ago

This is beautifully written. Bravo to the author.

augustt5y ago

Didn't everyone learn scientific notation in high school? It's pretty much exactly that, and you could put the coefficient/exponent into whatever bit pattern you'd like.

GuB-425y ago

It is scientific notation with a twist. The twist is the implicit 1. in the mantissa.

There is no easy way to make it work in decimal. So you need to learn how fractional parts work in binary, then move up to scientific notation in binary. Then you probably noticed that all numbers start with 1, so you can knock it off to save space. Oh, and add a special case for zero, because it is the only number that doesn't start with 1, and if you are feeling adventurous, there are subnormal numbers.

That's why I find the explanation in the article absolutely brilliant. By treating the exponent as a window, not only you don't need to bother with scientific notation in binary, but zero and subnormal numbers fit very nicely.

In fact, I've been working for years with floating point numbers and feel stupid for not realizing it works just as described in the article. And feeling stupid after reading something is usually a very good sign. I clicked, expected an article I could dismiss in a typical Hacker News fashion, but no, not this time.

anaphor5y ago

It's an implicit 1 or 0 (because it's binary floating point). The implicit 0 is for subnormal numbers. That's the part people usually don't explain when they're first introducing it. It's literally {1,0}.xxxxxx where x is also a 1 or 0. I.e. a binary floating point number in scientific notation. I've seen a lot of explanations that kind of gloss over that part of it (not saying you don't understand it, just that even the article doesn't make that clear enough IMO).

1 more reply

randyrand5y ago

It's not exactly scientific notation because a leading bit is assumed for normalized numbers.

The window and offset explanation naturally accounts for the leading bit, whereas the scientific notation explanation needs further explanation to explain how the leading bit works.

In short, 10^2 * .001 is not allowed in floating point. You can't have a mantissa that starts with 0. The leading bit means all mantissas must be greater than 1.

Without this understanding you won't have an intuitive understanding of the range and precision of floating point, which is why I think the window+offset explanation is much more natural.

romwell5y ago

I think this is worth saying about the leading bit (why 0.001 is not a valid mantissa):

In binary, we always know the first non-zero digit of any number - so there's no need to write it down. We know it's 1 because, well, binary.

So we don't waste space, and only write down all the other digits, and then use the exponent to put the 'point' into the right place.

We save a bit of space doing this.

1 more reply

codemonkey-zeta5y ago

I think most people do learn scientific notation, but the correspondence with floating point representation is probably not learned. It is not necessarily an obvious connection for a person who uses floats as "decimal numbers", which is the mental model I assume many programmers have. This post is enlightening because it shows how obvious the connection is.

anamexis5y ago

You just repeated the "how everybody hates floating point to be explained to them" section of this article.

j / k navigate · click thread line to collapse

96 comments

Veserv5y ago

[1] The approximation is actually offset in the x direction for the bias. If you want to be more accurate, you are actually graphing 2^(x - 127).

titzer5y ago

Veserv5y ago

I do not understand your statement. I can come up with two possible interpretations, but neither seem true to me. Can you provide an example of what you mean?

1 more reply

commandlinefan5y ago

bogomipz5y ago

Could you elaborate on this? Maybe an example? This sounds interesting but I'm failing to grasp it.

Someone5y ago

https://wiki.c2.com/?IeeeSevenFiftyFour:

2 more replies

saagarjha5y ago

1 more reply

BeetleB5y ago

The exponent is 8 bits. That translates to 0 through 255. This means your "window" begins from [1,2], [2,4], etc.

That sucks. It means I cannot represent any number between 0 and 1.

Now your window starts with [2^-127, 2^-126] and so on.

(I may have an off by one error here).

commandlinefan5y ago

So now, take the 16-bit pattern (0000 0011 1110 0000); you get a 0 sign bit, an exponent of 0, and a mantissa of 1.96875. So, with a bias of -15, that's 1.96875 * 2 ^ 0-15 = 0.00006008148193359375.

1 more reply

bkanber5y ago

edflsafoiewq5y ago

IOW the "windows" are just labeled in order from left to right, starting with 0 for the leftmost window.

SupriseAnxiety5y ago

GolDDranks5y ago

Another visualization that blew my mind; I learnt this from a presentation that was about doing floats with FPGAs:

Then, the 8 bits of the exponent determine where in the bit buffer that window points to, and the 23 (+1) bits are the contents of the window.

saagarjha5y ago

bkanber5y ago

That is great. Also helps remind you that for very large numbers, the precision is in the hundreds-of-thousands.

0-_-05y ago

https://fabiensanglard.net/gebbwolf3d/

https://fabiensanglard.net/gebbdoom/

wernsey5y ago

He also recently had a series of articles about the various ports of Out of this World/Another World to various systems which are well worth reading if you're into this sort of thing

totorovirus5y ago

this looks really interesting! thanks for sharing :)

zoomablemind5y ago

0: https://stackoverflow.com/questions/43965347/ulp-unit-of-lea...

splittingTimes5y ago

petschge5y ago

mschuetz5y ago

swimfar5y ago

I had never heard of this idea before. Do you have any references to this technique? Also, depending on what equations you're using, aren't you constrained to using a consistent system of units?

splittingTimes5y ago

Never anything official. It was just how it was done in the group and wider scientific community. Some googling pulled that up if it helps

https://books.google.de/books?id=uzJbyD6_3DAC&pg=PA4

And yes you have to be consistent and stay in that unit system. The other remaining two we used plain Kelvin for temperature and for the charge Coulomb was measures in electron charge. That's it.

1 more reply

bchociej5y ago

bkanber5y ago

The equation fleshes out the detail that the visualization misses. I do best when I have both.

seanalltogether5y ago

I always found the wikipedia examples for 16bit floating point helpful since the numbers are smaller. You can really see how the exponent and fraction affect each other in a very simple way.

https://en.wikipedia.org/wiki/Half-precision_floating-point_...

gopalv5y ago

BFloat16[1] is an interesting tweak to the original, but built with more modern requirements for ML in mind.

[1] - https://en.wikipedia.org/wiki/Bfloat16_floating-point_format...

enriquto5y ago

The 8-bit floating point is even easier to understand, since you can list them all in a single page, and even visualize the entire addition and multiplication tables.

piadodjanho5y ago

Also a great format to count how many representation are wasted with redundant representation (ZeroS, NaN, +inf, -inf).

1 more reply

dylan6045y ago

"(−1)S∗1.M∗2(E−127) How everybody hates floating point to be explained to them."

Now I know I'm weird, as that formula makes sense to me.

dntbnmpls5y ago

rwallace5y ago

Excellent article! Just one thing surprised me:

> While I was writing a book about Wolfenstein 3D[1], I wanted to vividly demonstrate how much of a handicap it was to work without floating point

1 more reply

jonplackett5y ago

Whenever I read something about how computers _really_ work (as in not just a nice easy to comprehend programming language) I realise just how much smarter some people in the world are than me.

bigdict5y ago

With every thing like that that you read, you get closer to them!

adrianmonk5y ago

Here's how I like to think of it.

Floating point numbers are just fractions but with one extra condition: the denominator is a power of two.

The normal rules of fractions apply. If you want to add them, you have to make sure the denominators match, which would involve scaling the numerators too.

Just like fractions, there are multiple ways of writing the same value. 3/2 is the same as 6/4.

You can't write 1/3 exactly because, hey look at your denominator, it's 3. Which isn't a power of 2, is it? So that can't be a floating point value.

cambalache5y ago

> Just like fractions, there are multiple ways of writing the same value. 3/2 is the same as 6/4.

This is not correct. In binary you only have 2 choices {0,1} for the numerator, so 3/2 is represented 1.1 and 6/4 can only be represented as 1.1 (feel free to add the completing 0s)

adrianmonk5y ago

The numerator is a binary integer, not a bit (and not the part of the number before the "decimal" point).

3 in binary is 11. 6 in binary is 110.

3/2 (decimal) is 11/10 (binary) and 6/4 (decimal) is 110/100 (binary).

Of course the denominators always have trailing zeros, so can be (and are) represented more compactly.

1 more reply

zozbot2345y ago

saagarjha5y ago

Subnormal numbers are a blemish on the otherwise beautiful IEEE 754 specification :(

biddlesby5y ago

I saw a very simple visualisation of floating point by simply plotting points along the number line and zooming out. That gets the idea across very quickly!

wenc5y ago

Yes, I came across something like that in my numerical computation course. This was for me the most compelling visualization of what a floating point number was.

Reals

0 __________________________________________________ 1

Floats

0 .. .... .. ... .. ... ... ... .. ... ... ... .. .... ... ... ... . ... .... ... .. . ... 1

saagarjha5y ago

A more accurate representation might be

  0.... . . . .  .  .  .  .    .    .    .    1

(Hacker News eats consecutive spaces, it seems like :/)

1 more reply

cmrdporcupine5y ago

This is a bit odd:

Likewise, the 68040 (1990) had onboard FP. Macintosh Quadra, Amiga 4000, and various NeXT models had it.

This feels like recent history, am I really that old?

jscheel5y ago

I remember wanting a math co-processor so I could use autocad at home as a kid.

fabiensanglard5y ago

And now you could probably buy an expensive Tesla V workstation dedicated to that but you don't have time anymore. Wait. No. That is me :P !

baby5y ago

Interesting, because I think about it the other way: the offset tells you between which power of two you're in:

* [2^-127, 2^-126] * ... * [1, 2] * [2, 2^2] * [2^2, 2^3] * ... * [2^128, 2^129]

jiveturkey5y ago

https://opencores.org/projects/fpuvhdl

dang5y ago

A thread from 2019: https://news.ycombinator.com/item?id=19084773

Discussed at the time: https://news.ycombinator.com/item?id=15359574

(Reposts are ok after a year or so.)

boomlinde5y ago

rational_indian5y ago

The leading 1. prefix can be omitted from the binary representation thus saving a bit in the process.

boomlinde5y ago

That doesn't answer my question, though. I'm asking why the significand is in the range 1-2 instead of 0-1.

jimmaswell5y ago

piadodjanho5y ago

Ackchyually...

The IEEE-754 has a lot of redundant representation. Not where you would expect though.

Caveat: Those features are invaluable for some niche applications, but not for the average joe.

To start. Every IEEE-754 float has two zero representation: one for positive zero and another negative negative zero (sic).

The posit floating point address all those issues. It uses an tapered encoding for the exponent field.

thechao5y ago

I'm mixed on Gustafson's posit stuff. For me, the only thing I'd change for fp would be:

1. -0 now encodes NAN.

2. +inf/-inf are all Fs with sign: 0x7FFFFFFF, 0xFFFFFFFF.

3. 0 is the only denorm.

Which does four good things:

1. Gets rid of the utter insanity which is -0.

2. Gets rid of all the redundant NANs.

3. Makes INF "look like" INF.

4. Gets rid of "hard" mixed denorm/norm math.

And one seriously bad thing:

1. Lose a bunch of underflow values in the denorm range.

However, as to the latter: who the fuck cares! Getting down to that range using anything other than divide-by-two completely trashes the error rate anyways, so why bother?

The rest of Gustafson's stuff always sounds like crazy-people talk, to me.

2 more replies

BeetleB5y ago

But isn't that accounted for by the fact the floating point number distribution is non-uniform? Half of all floating point numbers are between -1 and 1.

1 more reply

jimbob455y ago

This guy's Quake 1/2/3 source code deep dives are well worth checking out as well.

barbs5y ago

Awesome. Maybe now I can finally understand the fast inverse square root hack.

https://en.wikipedia.org/wiki/Fast_inverse_square_root

voz_5y ago

This is beautifully written. Bravo to the author.

augustt5y ago

Didn't everyone learn scientific notation in high school? It's pretty much exactly that, and you could put the coefficient/exponent into whatever bit pattern you'd like.

GuB-425y ago

It is scientific notation with a twist. The twist is the implicit 1. in the mantissa.

anaphor5y ago

1 more reply

randyrand5y ago

It's not exactly scientific notation because a leading bit is assumed for normalized numbers.

The window and offset explanation naturally accounts for the leading bit, whereas the scientific notation explanation needs further explanation to explain how the leading bit works.

In short, 10^2 * .001 is not allowed in floating point. You can't have a mantissa that starts with 0. The leading bit means all mantissas must be greater than 1.

Without this understanding you won't have an intuitive understanding of the range and precision of floating point, which is why I think the window+offset explanation is much more natural.

romwell5y ago

I think this is worth saying about the leading bit (why 0.001 is not a valid mantissa):

In binary, we always know the first non-zero digit of any number - so there's no need to write it down. We know it's 1 because, well, binary.

So we don't waste space, and only write down all the other digits, and then use the exponent to put the 'point' into the right place.

We save a bit of space doing this.

1 more reply

codemonkey-zeta5y ago

anamexis5y ago

You just repeated the "how everybody hates floating point to be explained to them" section of this article.

j / k navigate · click thread line to collapse