None of those self-driving MLs have a virtual world model in their "head", right? They just react to the latest video frame. If so, it's not even a fish level intelligence, it's more like a house plant.
If youre curious, Cruise has a fairly comprehensive (but long) overview of their tech that goes over their world model and world simulation stuff: https://youtu.be/uJWN0K26NxQ
All of the self driving systems I have intimate knowledge of have some level of "persistent memory". It dramatically improves performance in a lot of common situations, e.g. you're coming up to an intersection and your view of stationary cross traffic is occluded by a bus/truck. You enter the intersection and now the planner has to decide whether that vehicle is on an intersection path with the SDC. You only have a few frames of information without persistence and the velocity error might still be large. Thus the safety system has to slam on the breaks to avoid a collision that will never actually happen, potentially causing a real accident if the car behind is following too closely.