I'm guessing it would be like a fever dream to let an LLM be the dungeon master in isolation. Things would change after the fact in weird ways, especially when the context grows too large.
But what about coupling an LLM with a physics simulator and a 3D world model?
You still interact with the LLM in a text interface, but hidden conversations take place with the simulator where the LLM can interrogate the current state of the 3D world simulator to describe it to the player. You could even do this using GPT4-Vision to interpret rendered images. When the player performs an action, it is translated into "physical" actions into the 3D world simulator which updates its state.
It feels like someone should have done this already?