I don't see how Tesla is even a serious contender.
It's possible that there exists some error metric inside Tesla that consistently goes down with more training and bigger neural nets in their Vision FSD - whereas switching to LIDAR would reduce that error by a fixed 30%.
They just assume that vision will eventually work out.
Apple Watch is probably one of the greatest examples. So many of it's features are inferred via "basic" sensors.
On a different angle, sports refereeing is largely becoming possible due to advances in camera based analysis. We can turn 2d images into a nearly centimeter accurate representation of a playing field in seconds.
These cameras are in a very different and much less dynamic environment than on a road speeding at 100+ km/h while getting splashed on, shat on, dusted on, muddied, stroke by bugs, snowed, etc.
Starting with "basic" sensors is backwards. It is like aspiring to become a chess grandmaster so good you can play with your eyes closed, and starting out as a beginner with your eyes closed.
Whether this is correct for delivering self driving cars, we will find out soon enough. Long term though, it definitely makes sense. We just don't know what the missing pieces of the puzzle are.
this is commonly repeated but very obviously untrue.
We don't only have vision. We have a general intelligence, coupled with vision. In the absence of AGI, the base assumption has to be the sensor apparatus needs to be significantly superior to humans for an FSD system to drive at a comparable level.
Would you agree then, that if the goal was to develop AGI, just relying on vision is a credible choice?
But they don't. I can't see how anyone could look at modern driving and see an optimal state. Driving isn't being managed at all, it's killing droves of humans.
If we put the same restrictions on airplanes (flying by instrument is a crutch), everyone would rightfully find that ridiculous.
They appear to have bet on the wrong technology. The failure happened back in the design phase.
If a driver doesn't have vision, the right decision is to figure out how to safely stop.
Spend a few million years programming a computer to swing through trees and they'll probably get something that can drive a car.
What we lack is (still) the fundamental algorithms to learn from video. Tokenization like LLMs or diffusion are starting to fall short of this goal.