As you say, no one has figured end to end out. The way AI works some tasks get figured out quickly and some take a very long time. Elon has been talking about an end to end system for a very long time, and I’ve watched every lecture from Andrej Karpathy on their architecture. I like their plan, but they have been working towards end to end for years and I see no reason to believe anything will “change quickly”. AI is very much a “don’t count your chickens before they’re hatched” kind of technology. People can promise anything about AI, because when it works it feels like magic. But it’s not magic, and we can’t expect every goal to be met just because someone is using AI. Nobody knows what challenges lie ahead for an end to end self driving system precisely because no one has done it. And it should be noted that with their “hydra net” architecture it is relatively end to end but it is not a single large network, but an engineered assortment of many small networks. Quite a lot of human choice goes in to decisions around how to design that, and this also leads to edge cases. But even more general systems like GPT-4 suffer from issues with edge cases.
We still don’t even know how plausible a camera based system is. Human eyes and the human brain are so much more advanced than a computer and cameras, it may not follow that “images are all you need”.