While most GPUs support FP64, unless you pay for the really high-end scientific computing models, you're typically getting 1/32nd rate compared to FP32 performance. Even your shiny new RTX 4090 runs FP64 at 1/64th rate.
2xFP32 for most basic operations can be 1/4th the rate of FP32. It is quite often the superior solution compared to using the FP64 support provided in GPU languages.
I wonder if there is a hardware reason for this or It's just market segmenting by nvidia.
rot13 to avoid spoilers for people who haven't played the game: Gur fha tbrf abin va gjragl-gjb zvahgrf, fb guvf vfa'g npghnyyl na vffhr va cenpgvpr.
EVE Online had (still has?) a similar issue with its camera being able to zoom in on objects that are very far away. Normally, at those distances, you'd be using your overview or the HUD markers, but if you did zoom in on a far object, the origin would still be on your ship (or maybe the center of the area you were in), and the object would get distorted. Especially fun when it was a floating corpse.
Perhaps such a brilliant idea came to them in a dream. But maybe they forgot how they did it in another dream.
Essentially you translate the world back to origin when the player gets too far away.
Something like Kerbal space program is an example where I'd probably break out the doubles.
For final projection, clipping, z buffering, etc., single precision is almost certainly enough.
If you do this in the GPU then you would need it to handle double precision data.
> Overall, we are quite happy with how this solution turned out. We think it is the closest to "just works" that we can get.
I think this is the crux of it. The performance penalty is very small and the convenience factor is very high.
But, now I want to know what they do with the positions of lights in the scene... Likely transformed to view space regardless for deferred rendering, I'd guess.
The reason is that it is not enough to extend the precision of the 32-bit FP numbers. The exponent range must also be extended. The standard double-precision numbers have an exponent range that is large enough to make underflow and overflow very unlikely in most algorithms. With the very small exponent range of FP32 numbers, underflow and overflow is very likely and this must be corrected in any double precision implementation.
So it is not enough to use two FP32 numbers to represent one FP64 number. One must use either a third number for the exponent, or at least one of the two 32-bit numbers must be integer and partitioned into exponent and significand parts.
Both approaches will lead to much more complex algorithms and a much worse speed ratio for FP64 implemented with FP32 vs. FP128 implemented with FP64.
In deep learning, this is huge! If you have numbers this big, then something is definitely already wrong. If you have numbers that small, then you definitely don't care.
I wonder if deep learning will save us from poorly conditioned linear algebra too.
Back to Godot, I thought the answer would be to precompute the ModelView matrix on the CPU. Object -> World -> Camera is a “large” transformation. But the final Object -> Camera transform is “small”. I’m sure there’s a reason this doesn’t work, but I forget it.
Unreal 5 changes to doubles everywhere for large world coordinates. I wonder what fun issues they had to solve?
That's the core idea here. A bit more detail would help. Is that done in the GPU? Is that extra work for every vertex? Does it slow down rendering because the GPU's 4x4 matrix multiplication hardware can't do it?
I actually have to implement this soon in something I'm doing. So I really want to know.
This is overkill for what I'm doing. They want to zoom way out and see planet-sized objects. I just have a big flat world a few hundred km across. So the usual "offset the render origin" approach will work. I don't have to update on every frame, only when the viewpoint moves a few hundred meters.
Is there a “lossy compression” benefit to describing space with floats?
I was taught that MV/MVP should be calculated CPU-side per-model, and that doing it in the vertex shader is wasteful. Is that advice out of date?
Really half-floats are more interesting, saving 50% memory on the GPU for mesh data. You could imagine using half-floats for animations too!
Then we could have the debate about fixed point vs. floating. Why we choose to use a precision that deteriorates with distance is descriptive of our short sightedness in other domains like the economy f.ex. (lets just print money now close to origin and we'll deal with precision problems later, when time moves away from origin)
What you want is fixed point, preferably with integer math so you get deterministic behaviour, even across hardware. Just like float/int arrays both give you CPU-cache and atomic parallelism at the same time, often simplicity is the solution!
In general 64-bit is not interesting at all, so the idea Acorn had with ARM that jumping to 32-bit forever is pretty much proven by now. Even if addressing only jumped to from 26-bit to 32-bit with ARM6.
Which leads me to the next interesting tidbit, when talking 8-bit the C64 had 16-bit addressing.
But really all large worlds need chunking.
The real reason AAA never got into user generated content, is they have staff to create linear worlds.
After this economic crisis, linear content will more or less disappear.
Why listen to a hardcoded story when you can make your own just like in real life?
Scarcity is the key, one UGC networked world will make time valuable.