undefined | Better HN

0 pointsSharlin1y ago0 comments

This is not "just" video, however. It's interactive in real time. Sure, you can say that playing is simply video with some extra parameters thrown in to encode player input, but still.

0 comments

10 comments · 1 top-level

slashdave1y ago· 9 in thread

It is just video. There are no external interactions.

Heck, it is far simpler than video, because the point of view and frame is fixed.

SeanAnderson1y ago

I think you're mistaken. The abstract says it's interactive, "We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction"

Further - "a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions." specifically "and actions"

User input is being fed into this system and subsequent frames take that into account. The user is "actually" firing a gun.

slashdave1y ago

No, I am not. The interaction is part of the training, and is used during inference, but it is not including during the process of generation.

1 more reply

smusamashah1y ago

It's interactive but can it go beyond what it learned from the videos. As in, can the camera break free and roam around the map from different angles? I don't think it will be able to do that at all. There are still a few hallucinations in this rendering, it doesn't look it understands 3d.

1 more reply

hypertele-Xii1y ago

Then why do monsters become blurry smudgy messes when shot? That looks like a video compression artifact of a neural network attempting to replicate low-structure image (source material contains guts exploding, very un-structured visual).

1 more reply

nopakos1y ago

Maybe it's so advanced, it knows the players' next moves, so it is a video!

1 more reply

raincole1y ago

I highly suggest you to read the paper briefly before commenting on the topic. The whole point is that it's not just generating a video.

slashdave1y ago

I did. It is generating a video, using latent information on player actions during the process (which it also predicts). It is not interactive.

SharlinOP1y ago

Uff, I guess you’re right. Mea culpa. I misread their diagram to represent inference when it was about training instead. The latter is conditioned on actions, but… how do they generate the actual output frames then? What’s the input? Is it just image-to-image based on the previous frame? The paper doesn’t seem to explain the inference part at all well :(

slashdave1y ago

It should be possible to generate an initial image from Gaussian noise, including the latent information on player position

j / k navigate · click thread line to collapse

0 comments

10 comments · 1 top-level

slashdave1y ago· 9 in thread

It is just video. There are no external interactions.

Heck, it is far simpler than video, because the point of view and frame is fixed.

SeanAnderson1y ago

I think you're mistaken. The abstract says it's interactive, "We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction"

Further - "a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions." specifically "and actions"

User input is being fed into this system and subsequent frames take that into account. The user is "actually" firing a gun.

slashdave1y ago

No, I am not. The interaction is part of the training, and is used during inference, but it is not including during the process of generation.

1 more reply

smusamashah1y ago

1 more reply

hypertele-Xii1y ago

1 more reply

nopakos1y ago

Maybe it's so advanced, it knows the players' next moves, so it is a video!

1 more reply

raincole1y ago

I highly suggest you to read the paper briefly before commenting on the topic. The whole point is that it's not just generating a video.

slashdave1y ago

I did. It is generating a video, using latent information on player actions during the process (which it also predicts). It is not interactive.

SharlinOP1y ago

slashdave1y ago

It should be possible to generate an initial image from Gaussian noise, including the latent information on player position

j / k navigate · click thread line to collapse