undefined | Better HN

0 pointsrvnx1y ago0 comments

The concept is that if you train a Diffusion model by feeding all the possible frames seen in the game.

The training was over almost 1 billion frames, 20 days of full-time play-time, taking a screenshot of every single inch of the map.

Now you show him N frames as input, and ask it "give me frame N+1", then it gives you the frame n. N+1 back based on how it was originally seen during training.

But it is not frame N+1 from a mysterious intelligence, it's simply frame N+1 given back from past database.

The drift you mentioned is actually a clear (but sad) proof that the model does not work at inventing new frames, and can only spit out an answer from the past dataset.

It's a bit like if you train stable diffusion on Simpsons episodes, and that it outputs the next frame of an existing episode that was in the training set, but few frames later goes wild and buggy.

0 comments

mensetmanusman1y ago

Research is the acquisition of knowledge that may or may not have practical applications.

They succeeded in the research, gained knowledge, and might be able to do something awesome with it.

It’s a success even if they don’t sell anything.

1 more reply

jetrink1y ago

I don't think you've understood the project completely. The model accepts player input, so frame 601 could be quite different if the player decided to turn left rather than right, or chose that moment to fire at an exploding barrel.

rvnxOP1y ago

1 billion frames in memory... With such dataset, you have seen practically all realistic possibilities in the short-term.

If it would be able to invent action and maps and let the user play "infinite doom", then it would be very different (and impressive!).

TeMPOraL1y ago

Like many people in case of LLMs, you're just demonstrating unawareness of - or disbelief in - the fact that the model doesn't record training data vetbatim, but smears it out in high-dimensional space, from which it then samples. The model then doesn't recall past inputs (which are effectively under extreme lossy compression), but samples from that high-dimensional space to produce output. The high-dimensional representation by necessity captures semantic understanding of the training data.

Generating "infinite Doom" is exactly what this model is doing, as it does not capture the larger map layout well enough to stay consistent with it.

2 more replies

OskarS1y ago

> 1 billion frames in memory... With such dataset, you have seen practically all realistic possibilities in the short-term.

I mean... no? Not even close? Multiply the number of game states with the number of inputs at any given frame gives you a number vastly bigger than 1 billion, not even comparable. Even with 20 days of play time to train no, it's entirely likely that at no point did someone stop at a certain location and look to the left from that angle. They might have done from similar angles, but the model then has to reconstruct some sense of the geometry of the level to synthesize the frame. They might also not have arrived there from the same direction, which again the model needs some smarts to understand.

I get your point, it's very overtrained on these particular levels of Doom, which means you might as well just play Doom. But this is not a hash table lookup we're talking about, it's pretty impressive work.

1 more reply

j / k navigate · click thread line to collapse

0 pointsrvnx1y ago0 comments

The concept is that if you train a Diffusion model by feeding all the possible frames seen in the game.

The training was over almost 1 billion frames, 20 days of full-time play-time, taking a screenshot of every single inch of the map.

Now you show him N frames as input, and ask it "give me frame N+1", then it gives you the frame n. N+1 back based on how it was originally seen during training.

But it is not frame N+1 from a mysterious intelligence, it's simply frame N+1 given back from past database.

The drift you mentioned is actually a clear (but sad) proof that the model does not work at inventing new frames, and can only spit out an answer from the past dataset.

It's a bit like if you train stable diffusion on Simpsons episodes, and that it outputs the next frame of an existing episode that was in the training set, but few frames later goes wild and buggy.

0 comments

mensetmanusman1y ago

Research is the acquisition of knowledge that may or may not have practical applications.

They succeeded in the research, gained knowledge, and might be able to do something awesome with it.

It’s a success even if they don’t sell anything.

1 more reply

jetrink1y ago

rvnxOP1y ago

1 billion frames in memory... With such dataset, you have seen practically all realistic possibilities in the short-term.

If it would be able to invent action and maps and let the user play "infinite doom", then it would be very different (and impressive!).

TeMPOraL1y ago

Generating "infinite Doom" is exactly what this model is doing, as it does not capture the larger map layout well enough to stay consistent with it.

2 more replies

OskarS1y ago

> 1 billion frames in memory... With such dataset, you have seen practically all realistic possibilities in the short-term.

1 more reply

j / k navigate · click thread line to collapse