undefined | Better HN

0 pointsfoota1y ago0 comments

I wonder if they could somehow feed in a trained Gaussian splats model to this to get better images?

Since the splats are specifically designed for rendering it seems like it would be an efficient way for the image model to learn the geometry without having to encode it on the image model itself.

0 comments

3 comments · 1 top-level

Chance-Device1y ago· 2 in thread

I’m not sure how that would help vs just training the model with the conditionings described in the paper.

I’m not very familiar with Gaussian splats models, but aren’t they just a way of constructing images using multiple superimposed parameterized Gaussian distributions, sort of like the Fourier series does with waveforms using sine and cosine waves?

I’m not seeing how that would apply here but I’d be interested in hearing how you would do it.

footaOP1y ago

I'm not certain where it would fit in, but my thinking is this.

There's been a bunch of work on making splats efficient and good at representing geometry. Reading more, perhaps NERFs would be a better fit, since they're an actual neutral network.

My thinking is that if you trained a NERF ahead of time to represent the geometry and layout of the levels, and plug that in to the diffusion model (as a part of computing the latents, and then also on the other side so it can be used to improve the rendering) then the diffusion model could focus on learning how actions manipulate the world without having to learn the geometry representation.

Chance-Device1y ago

I don’t know if that would really help, I have a hard time imagining exactly what that model would be doing in practise.

To be honest none of the stuff in the paper is very practical, you almost certainly do not want a diffusion model trying to be an entire game under any circumstances.

What you might want to do is use a diffusion model to transform a low poly, low fidelity game world into something photorealistic. So the geometry, player movement and physics etc would all make sense, and then the model paints over it something that looks like reality based on some primitive texture cues in the low fidelity render.

I’d bet money that something like that will happen and it is the future of games and video.

1 more reply

j / k navigate · click thread line to collapse