I think this is not nearly as important as most people think it is.
In hollywood movies, everyone already knows about "continuity errors" - like when the water level of a glass goes up over time due to shots being spliced together. Sometimes shots with continuity errors are explicitly chosen by the editor because it had the most emotional resonance for the scene.
These types of things rarely affect our human subjective enjoyment of a video.
In terms of physics errors - current human CGI has physics errors. People just accept it and move on.
We know that superman can't lift an airplane because all of that weight on a single point of the fuselage doesn't hold, but like whatever.
There are lots of tools being built to address this, but they're still immature.
https://x.com/get_artcraft/status/1972723816087392450 (This is something we built and are open sourcing - still has a ways to go.)
ComfyUI has a lot of tools for this, they're just hard to use for most people.
This release is clearly capable of generating mind-blowingly realistic short clips, but I don't see any evidence that longer, multi-shot videos can be automated yet. With a professional's time and existing editing techniques, however...