If you're interested in the process - or exploring your specific approach then why stop?
Coming out with a method second is much less likely to be rewarded by the community: not being rewarded for work completed is not sunk cost, its just a straight up loss.
The additional interest might actually be helpful.
The time component is super interesting here though!
The paper I think your referring to made the interesting leap that a 3d radiance field could be rerendered out as a field of Gaussian splats, and that this would probably run faster in modern GPU pipelines for real-time performance. It looks like they also have the nice property of being able to be shifted around in memory quickly hence the animation property seen here.
I'm curious, would you classify particle effects drawn with quads as 4D gaussian splatting too?
You could "model" 3d objects with the gaussians by just putting a bunch together. It was a way to produce fast rendering 3d images without using a bunch of polygons. The results back then were...left behind by other techniques.
There's a massive back catalog of computer graphics work on the technique, it's usually just easiest to use the search tools and search back for all dates leading up to say...2021 and you'll find tons of normal old stuff like CS 302 - Computer Graphics courseware slides or whatever on the technique.
https://www.google.com/search?q=gaussian+splat+-site%3Apinte...
The technique worked well on non-accelerated (CPU only) hardware of the era, with the additive approach saving the pain of needing to keep a z buffer or fragment list.
Gaussian voxel reconstruction is useful in medical and GIS settings, which, if memory serves, is where Kyle Freeman from Novalogic drew on for his work on Comanche. As far as I know, that was the first commercial game with voxel rendering... It's been a bit since I played it, but the swimming jaggies make me think that it was Manhattan distance height map offset by planar traversal (kinda like Doom raycasting) or some similar trick. I don't recall any intersections or overhangs, but, to be fair, I was a middle schooler when Comanche came out.
It also ran fine on my weak sauce PC.
Once acceleration hit, transformation of triangles with fixed-function pipelines took over. The ability to push textured triangles with minimal per-pixel value adjustment took over. Slowly but surely we've swing back to high ALU balance (albeit via massive stream parallelism). We've shifted from heavy list/vertex transformers to giant array multiply/add processors.
It's a pretty great time to be a processing nerd.
I realize that a lot has happened since, but this is likely where it all started :)
Essentially they're doing what you do when you train a neural network, only that instead of adjusting weights connecting "neurons", you adjust the shape and position of gaussians, and the coefficients of spherical harmonics for the colors.
This requires the rendering step to be differentiable, so that you can back-propagate the error between the rendering and the ground-truth image.
The next key step is to every N iterations adjust the number of gaussians. Either fill in details by cloning a gaussian in an area which is undercovered, or split a gaussian in an area which is overcovered.
They use the gradient of the view-space position to determine if more detail is needed, ie those gaussians which the optimizer wants to move significantly over the screen seems to be in a region with not enough detail.
They then use the covariance of the gaussians to determine to split or to clone. Gaussians with large variance get split, the others cloned.
They also remove gaussians which are almost entirely transparent, no point in keeping those around.
That's my understanding at least, after a first time gloss-through.
> Essentially they're doing what you do when you train a neural network, only that instead of adjusting weights connecting "neurons", you adjust the shape and position of gaussians, and the coefficients of spherical harmonics for the colors.
My brain:
> They're providing inverse reactive current to generate unilateral phase detractors, automatically synchronizing cardinal gram meters.
The idea in backpropogation is instead to mathematically relate a change in output to a change in the parameters. You figure out how much you need to change the parameters to change the output a desired amount. Hence the "back" in the name, since you want to control the output, "steering" it in the direction you want, and to do so you go backwards through the process to figure out how much you need to change the parameters.
Instead of "if I turn the knob 15 degrees the temperature goes up 20 degrees", you want "in order to increase the temperature 20 degrees the knob must be turned 15 degrees".
By comparing the output with a reference, you get how much the output needs to change to match the reference, and by using the backpropagation technique you can then relate that to how much you need to change the parameters.
In neural nets the parameters are the so-called weights of the connections between the layers in the model. However the idea is quite general so here they've applied it to optimizing the size, shape, position and color of (gaussian) blobs, which when rendered on top of each other blend to form an image.
Changing a blobs position say might make it better for one pixel but worse for another. So instead of doing a big change in parameters, you do small iterative steps. This is the so-called training phase. Over time the hope is that the output error decreases steadily.
edit: while backpropagation is quite general as such, as I alluded to earlier, it does require that the operation behaves sufficiently nice, so to speak. That's one reason for using gaussians over say spheres. Gaussians have nice smooth properties. Spheres have an edge, the surface, which introduces a sudden change. Backpropagation works best with smooth changes.
Also has anyone been working on solving the "blurry" look these splats have up close?
This seems to be a rendering efficiency innovation, not particular to scanning.
That means it applies to artificially generated environments, whether photo realistic or stylized, and whether based on a real environment or a completely fictional one.
But of course, any photorealistic, extremely faithful to the smallest detail, rendering of a real place is going to involve a lot of scanning. That is true for any kind of rendering.
That said, games don't have to be super realistic to be fun. E.g. I could imagine a game based on GS at "Minecraft resolution".
His editing is hilarious too.
https://lumalabs.ai/capture/ed9d985b-9cc1-49e0-a39c-88afa203...
https://lumalabs.ai/capture/83e9aae8-7023-448e-83a6-53ccb377...
https://lumalabs.ai/capture/7f8df9c9-c548-4a47-9892-e945637c...
https://lumalabs.ai/capture/076fcfdc-ea80-4fdc-8159-c9fed831...
The key drawback that isn't highlighted is that you need a physical space to be a close approximation of what you want to render. So if you want to make a few counter strike maps based off of your workplace (not recommended) then this would be a good technology, but if you want to make an open world on an alien planet you're likely better off with traditional rendering.
Although actually, and on a slightly more innocent (but just as edgy!) note, the thing that immediately popped into my head upon reading "4D Gaussian Splatting", was the music from the 1992 Future Crew demo Unreal, and the image of it's inter-scene title screens. ["IYKYK", but basically, that famous old PC demo consists of several short sections, each showcasing a particular coding/graphical technique - each section prefaced by a title screen which named the effect being showcased.]
YT of Unreal demo, as citation for this highly-important observation : https://www.youtube.com/watch?v=InrGJ7C9B3s