It seemed like all the "full cost" negatives Andrej mentioned were related to Tesla's ability to execute, and not what would actually produce better results. Tesla would have to be able to reliably procure parts, write reliable firmware, create designs and processes that won't increase unexpected assembly line stops, etc.
Regarding results, the best Andrej can do is, "In this case, we looked at using it and not using it, and the delta was not massive." In other words, the results are better, but not enough to make up for the fact that Telsa can't support additional sensors without incurring a prohibitive amount of additional risk to Tesla. Risk to passengers doesn't appear to be a consideration.
Q: "are less sensors less safe/effective?"
A: "well more sensors are costly to the organization and add more tech debt so safety is orthogonal and not worth answering".
Q: "Does [removing some sensors] make the perception problem harder, or easier?"
(note, this is literally what Lex asked, your restatement is misleading)
A: [paraphrasing] "Well more sensor diversity makes it harder to focus on the thing that I believe really moves the needle, so by narrowing the space of consideration, I think we'll get better results"
Karpathy might not be telling the truth, I don't know. But it's a much more credible pitch than you make it sound, because it's often true that you can deliver better by focusing on a smaller number of things. Engineering has always been about tradeoffs. Nobody is offering Karpathy infinite money plus infinite resources plus infinite time to do the job.
Again, I'm not saying Karpathy is honest or correct. I'm saying that the rephrasings in this comment and this thread are hilariously unfair.
In general the ‘harm to consumers’ is really just making it more likely they damage the car in a parking lot or their garage, which tells you where their priorities are (sales, Automotive gross profit). Assuming occupancy network works, the only real blind spot left is if something in front of the car changes in between it turning off and on (assuming occupancy will 'remember' the map around it when it goes to sleep).
Also, Tesla’s strategy for safety is seemingly “excel in industry standard tests, ie. IIHS and EuroNCAP”, so this might be a case of the measure becoming a target.
The sensors are unreliable and expensive in terms of R&D. Having marginal parts which takes money from a finite R&D budget can easily result in a worse product. “They contribute noise and entropy into everything.” … “you’re investing fully into that [vision] and you can make that extremely good. You only have a finite amount of spend of focus across different facets of the system.”
His standpoint can be summed up as “I think some of the other companies are going to drop it.” Which would be really interesting if true.
Team focus on vision which is by far the highest accuracy and bandwidth sensor allows for a faster rate of safety innovation given a constant team size.
You may be right about the actual decision process Tesla went through, but Karpathy is right in principle. One of the first things he says is "there can be problems with [the sensors]", and a lot of what he mentions increases the risk of run-time failure, not just cost.
It's easy to cast this as an optimization problem where you're trading off asymptotically improved sensing for linearly or superlinearly increased failure rates. There's certainly a point where the complexity of more sensors or certain types of sensors outweighs any marginal benefit they provide.
Cameras can also fail at run-time there can (and is) be variability in how they're mounted, in the lenses, in the sensors. They can get blinded or not get enough light. Their cabling can fail random components can fail.
Tesla has claimed that vision outperforms vision+radar but anecdotal reports don't seem to support that conclusion. IMHO these technologies are not directly replaceable, but are complementary. It's like you can't replace your ears with your eyes (yeah, you can read lips, if they're visible).
But sure, there is a sweet spot. Is Tesla really optimizing for best performance at any cost or are they optimizing making more money and selling that to us as an improvement? That's really the question and I don't think we got a frank answer there.
We really should be focusing on what is the best solution and trying to solve price issues through existing techniques e.g. economies of scale, competition, miniaturisation. Instead they are trying to build whatever solution they can that fits in a pre-defined cost window.
Except this isn't a new phone or sneakers we are trying to take to market it's something that will directly impact people's lives.
Why not have a thousand sensors if more is better?
This mindset is something I see a lot, that “best” means the technically optimal (or sometimes just personally most convenient) solution to the specific problem that they personally are working on. If they take a step back and look at the bigger picture, the technical merits are usually only a tiny part of the whole decision.
Ultimately the only requirement is that the system is safer than humans by some margin that people are comfortable with buying such a system. If that amount is even as little as 2x safer than humans, we still have a moral obligation to roll that out even if we could be 5x safer if we had another $50k worth of sensors and processors on the car.
I’m not saying this is definitely true, and at the moment we probably can’t verify it either. I’m just “steel manning” his case, as Lex loves to say.
I think you’re probably correct that the business aspect was a significant factor, but perhaps it wasn’t everything.
By getting rid of the extra sensors they eliminate a temporary crutch and focus resources on the simple solution.
Not a new concept by the way. Henry Ford was obsessed with simplifying and eliminating every part that wasn’t necessary on the model T for virtually all the same reasons.
In a vacuum, how can cameras ever be better than cameras + other sensors?
I think this mischaracterizes Andrej's response. If anything he is referring to a wholistic view of the vehicle, which includes but doesn't entirely consist of Tesla. For example, 5-10 years down the road, when sensors start going bad, consumers will appreciate fewer things to go wrong with a vehicle--that is one of the advantages of electric over ICE after all.
If anything this is an acknowledgement that George Hotz was right in focusing on optical sensors with Comma.ai.
1) He's not touching on the software cost of integrating different sensor data into the same trained machine learning model; it is likely far simpler to just stick to stereoscopic vision data (the same thing the human genome decided!)
2) That said, it seems at least theoretically advantageous to have a sensory system that exceeds that which humans are limited to; things like LIDAR can work in complete darkness and potentially spot, for example, pedestrians crossing a dark road without any reflective clothing on, where a vision-based system would fail (perhaps add infrared sensation?)
Anyway, doesn't AEB (automatic emergency braking) have to be installed in every car, by law, in the US, around now? And wouldn't that be less reliable if done via vision?
There’s a lot more to perception while driving than just stereoscopic vision.
First, your stereoscopic “cameras” (eyes) are mounted in free-rotating sockets, which are themselves mounted in a rotating and swiveling base (your head/neck). Your eyes can do rapid single-point autofocus better than any existing camera. They also have built in glare mitigations —- squinting, sunglasses, and sun visors. This system is way more advanced than fixed cameras. Yes, even an array of fixed cameras with a 360 degrees field of view.
Then you have your sense of touch, your hearing, and your equilibrio sense. You feel motion in the car. You feel vibrations in the pedals. You hear road noise, other cars, sirens, and the engine (not much in EVs). You smell weird smells and know when you’re driving with your e-brake on or when there’s a skunk nearby. There’s a lot getting fused with the vision to make it all happen, and I think you’d be surprised how “broken” your driving capabilities would be if you took one of these “background” senses out of the equation.
My anecdote: I drive a manual transmission car. A few months back, I woke up with no hearing in my right ear. Spooked, I drove to urgent care. I could not drive well at all —- I was holding low gears for way too long. I learned that I use hearing almost exclusively to know when to shift. If you had asked me beforehand, I probably would have said that I’m visually monitoring the tachometer to know when to shift. Not the case. Also, I had a TERRIBLE sense of my surroundings. As I drive, I’m definitely building a model of the environment around me based on road noise, sound from other cars, sirens, and the like. Without hearing in just one ear, I felt very disconnected and unsafe. Living in California where lanesplitting is legal, I had several motorcycles catch me completely off guard. I had my hearing restored at urgent care and everything went back to normal immediately on the drive home.
I think Andrej and Tesla massively overestimate vision’s sole ability to solve the problem. Humans are fusing lots of sensation to drive well.
Yeah and till we had reliable and powerful artificial lighting, it was highly unsafe to journey in low visibility/ darkness. We used to finish journeys when darkness fell.
Animals that do require precise movement in low visibility (bats, dolphins) conditions often evolved ultrasound solutions.
So should we license Tesla vehicles to only operate when visibility and weather forecast is good and not drive in the dark at all?
and also i dont understand your assertion that it was some kind of cynical maneuver to re-frame the question. he could have also said "yes, more sensors are always better but you can add an arbitrary number of sensors and so we had to decide where to draw the line. the cameras we use are capable of meeting our goal of full self driving that is significantly safer than a human driver. and this also streamlines the production and software which has a material impact on our ability to actually produce the cars which is of course necessary to meet the goal of making self driving cars. bloat could actually kill tesla."
this is logically the same thing that he said in the interview, so whats cynical about it? how is it underhanded?
also is there some intrinsic limitation of the dynamic range of cameras? people are talking about problems with dynamic range being intrinsic to cameras but im pretty sure that cameras and especially camera suites that do not have more problems with dynamic range than a human eye are possible to make and probably already on the market.
I think it's possible that professional movie cameras (with the appropriate lenses) may have higher dynamic range than human vision. Good luck getting those cheaper than a lidar.
It wasn't a dig. It was calling out a bullshit move that, in my opinion, Andrej deployed out of panic more than strategically. (My evidence for this being Andrej eventually gave a good answer.)
Although he didn't explicitly say so, neither his answer nor Elon's "take it out 'cause you can always put it back in if it turns out you really need it" philosophy absolutely rule out lidar coming back in the future if some remaining edge case just requires it. Clearly he thinks this is quite unlikely, however.
I agree with this assessment. However:
> Telsa can't support additional sensors without incurring a prohibitive amount of additional risk to Tesla. Risk to passengers doesn't appear to be a consideration.
This is a stupidifying take. Of course when you work in a line of business producing gadgets that, as an unintended side-effect, kill a lot of people (napkin math suggests above 2 milli-kills per car in the US), you will need to pick a point at which you say further fatality reduction is no longer justified given the economic cost of achieving it. Even if you are a pure altruist (if you go out of business, less safe cars will replace yours). Conversely, even if you are the embodiment of capitalist evil, risks to passengers will absolutely affect your bottom line and if are rational you will take them into consideration. Any meaningful criticism needs to be about the trade-offs they make, not that they make them or are loath to explicitly say so on camera.
You're right — the sad truth is that corporations put costs on human lives every day. Where I think we disagree is that you believe they made the decision based primarily on costs. After watching this video, I believe they made the decision because they didn't think they could reliably implement and support a sensor fusion approach.
(BTW, I enjoyed "stupidifying"! I'm sorry I made people stupider.)
percieved, not real risks to customers. PR matters more than reality
Because of the cost of additional eyes. If Tesla is optimizing for cost against safety, that's sort of the point.
I don't believe that's totally the case. Andrej later makes a better argument regarding limited R&D bandwidth, noise and entropy. But the "I would almost reframe the question" evasion was disconcerting. It's a textbook media trained tactic for avoiding a question to which you have no good answer. That it was deployed here badly against a skilled interviewer such that it backfired is a valid observation.
Cameras have poor dynamic range and can be easily blinded by bright surfaces. While it is true that humans do fine with only eyes, our eyes are significantly better than cameras.
More importantly, expectations are higher when an automated system is driving the car. It is not sufficient if, in aggregate, self-driving cars have fewer accidents. If you lose a loved one in an accident where the accident could have been easily avoided if a human was driving, then you're not going to be mollified to hear that in aggregate, fewer people are being killed by self-driving cars! You'd be outraged to hear such a justification! The expectation therefore is that in each individual injury accident a human clearly could not have handled the situation any better. Self-driving cars have to be significantly better than humans to be accepted by society, and that means it has to have better-than-human levels of vision (which lidars provide).
This approach not only simpler as it removes photo processing/encoding but the result is that the NN can operate with a very high dynamic range similar to the human eye and in many cases can be sensitive on the single-photon level.
That sentence does not make sense. There's no such thing as a count without a corresponding interval that count occurred over. That interval is the exposure.
You can of course do lots of (very) short exposures to avoid sensor saturation. That's "just" a movie at a very high frame rate. And then you can post-process this in lots of exciting ways, align the frames, average them, etc, etc.
The human eye isn't so great on those terms. But humans can raise their hand to block the sun if it's straight at our eyes.
"Tesla later said that during the crash, Autopilot’s camera could not distinguish between the white truck and the bright sky."
https://www.nytimes.com/2021/12/06/technology/tesla-autopilo...
https://youtu.be/ODSJsviD_SU?t=4424
he clearly states 16x dynamic range as a result of direct photon processing.
Nice tech and single photons and whatnot but it still runs into things that a radar with some really simple code wouldn't. ¯\_(ツ)_/¯
https://www.nytimes.com/2021/07/05/business/tesla-autopilot-...
Excerpt:
Mr. Rajkumar of Carnegie Mellon, who reviewed the video and data at the request of The Times, said Autopilot might have failed to brake for the Explorer because the Tesla’s cameras were facing the sun or were confused by the truck ahead of the Explorer. The Tesla was also equipped with a radar sensor, but it appears not to have helped.
“A radar would have detected the pickup truck, and it would have prevented the collision,” Mr. Rajkumar said in an email. “So the radar outputs were likely not being used.”
https://www.nytimes.com/2021/12/06/technology/tesla-autopilo...
Excerpt:
Tesla later said that during the crash, Autopilot’s camera could not distinguish between the white truck and the bright sky. Tesla has never publicly explained why the radar did not prevent the accident.
So if fashion changes, pedestrians may suddenly look like road too, as just an example.
Another problem is that state-of-the-art classification networks have an accuracy in the 90% range. Given that a car has to take hundreds of decisions in a single ride, then even if the accuracy was 99%, you see that error rate simply gets too high.
If you're referring to ImageNet SOTA, is has 20000 different classes, including 120 different dog breeds [1]. This is a vastly different task than reliably detecting pedestrians where Tesla can actively curate a dataset of hard examples (from their fleet), whereas ImageNet is fixed, sometimes with low quality labels and as few as a couple of hundred examples. Tesla can also pick a point on the ROC curve to give higher recall but more false positives (which is important for VRUs specifically). Another big factor is that Tesla is using video, not still images, which makes predictions even more robust.
And that's just for pedestrians, Tesla are also using a general ViDAR (visual LiDAR) which is trained to detect obstacles that do not have a specific class. The ViDAR again operates on image sequences, not a single image, and can thus pick out structure from motion.
They also have better failure modes and a really sophisticated error management system. They are susceptible to optical illusions, though.
> It is not sufficient if, in aggregate, self-driving cars have fewer accidents.
This is the incorrect analysis anyways. This was always going to be true because a large portion of accidents are single vehicle accidents where the driver was at fault for the crash. Usually due to speeding, alcohol, youth, or a combination of them.
If they didn't have fewer accidents then something is very very wrong with the entire idea. Which may very well be the outcome here. Looking at multi-vehicle accidents where there was no fault of the driver who died, it's not clear that an automated system driving the car would have saved them.
Roads are built right next to cliffs and bodies of water. Semi trucks can completely destroy your vehicle in an instant. Large accidents on snowy or foggy highways happen. Drunk drivers exist and sometimes literally do come out of nowhere, a pickup truck moving at 60mph has enough energy to knock a firetruck onto it's side if you hit it side-on and freeway ramps dump out right onto residential streets. Parts fail, floormats get stuck, people don't wear their seatbelts, and you can get a license to ride on a motorcycle if you want.
It's a guess based on the research I've done, but my expectation is around 20% of fatal accidents can in some way be prevented by automation. You'd honestly prevent more fatalities by putting an ignition interlock on everyone's vehicle or building real barriers between traffic and pedestrians.
Also plenty of suicides in that group, which confuses the stats.
We really need SDCs to have fewer accidents than human drivers, excluding the suicides.
Assuming, of course, that automation does not introduce its own failure modes.
That's a strong assumption.
We do not. Humans are terrible at driving. Traffic accidents are one of the leading causes of death in the developed world. Billions of dollars of property damage occur every year because humans are not up to the task. A self driving system that is as safe as an average human driver would be an absolute failure.
Perhaps leading cause of premature death or leading cause of accidental death or leading cause in demographics who are otherwise unlikely to die, but they are nowhere close to the top of the overall list.
But not because of a lack of visual information.
Most of the time it's a la k of concentration or an overestimation of one's own driving abilities.
AI is nowhere near that.
Not even close, really. A bit under 1%. You are more likely to die from an overdose, or suicide. And much, much, much more likely to die from cancer or heart disease.
And that is without getting into the trade-offs. Cars at least have a significant utility value, which is not true of suicide, opiate addiction, cancer, or heart disease. We should try to reduce traffic deaths, but we should not lose perspective.
But you still have to address his system argument, which was that adding geegaws that added little would actually increase overall risks along the supply chain (plus maintenance) while distracting the team and adding more risk that way, for very little apparent (but only apparent) gain. The team does believe that they'll get to better than human driving, and do that without lidar.
Do you? It's his argument that he needs to substantiate... it's not my burden to confirm his conjecture. And even looking at it on its face... it's clearly self-serving bs. It doesn't seem to be a problem for any other car company, so I'm a bit confused as to why it's such a problem for Tesla. Of course, the obvious answer is that Tesla is cheap and doesn't want to pay to have a team that would have sufficient bandwidth to do what every other car company and self-driving system is doing...
It's about more information is not always better. It can instead muddy the waters. It can create confusion.
It would be sufficient if it would be the case. With actual proof.
Reality is that in limited abstract situations, self driving card maybe have some advantages. But, that is all that we can claim. And when self driving fails, somehow human is always the cause.
From a public standpoint I don’t think it’s sufficient because there’s inherent trust lacking in an automated system. With ape-driven systems we have a certain amount of trust because we can more accurately intuit what the other ape is reasonably thinking. This is not the case with autonomous driving which leads to a wider amount of uncertainty. Not unlike how we are intuitively less trusting of someone who is legitimately “crazy” even if statistically we don’ can’t say they are shown to be more dangerous.
I personally think they should use as much data inputs as possible: radar, IR, LIDAR, mesh networks, fixed route information.
Where tesla went particularly wrong IMO is ignoring some sort of route-based chunk information which is how humans navigate. IIRC Elon said something to the effect of just having an algorithm to work everywhere.
Humans use the basic algorithm "stay in lane, drive forward" and then decorate with signs, knowledge of curves, locations of potholes, dangerous low-viz corners, likelihood of surprise stopped traffic, obscured driveways, general character of neighborhoods, road purpose. Weather. Windy sections, icy sections, light availability anomalies. What type of vehicle. Repair state of vehicle.
A general AI algorithm will never be able to properly account for flavors/tags/chunk info on routes. Especially since cloud precomputation is so available these days.
Anyway, while recognizing that Tesla's "Fully Self Driving" is not as advertised, and we are a ways from self driving for any statistical measure of superiority to a healthy aware adult, it is still damn impressive what FSD vids show.
Do AI driving systems try to make "subsystems" of AI networks to reduce inputs to various higher-level inputs, or do other just throw a ton of inputs at a big ass network and just let the entire system rise from the soup of information?
[1]https://www.youtube.com/watch?v=j0z4FweCy4M (2021), https://www.youtube.com/watch?v=ODSJsviD_SU (2022)
If you've ever driven in Vietnam, that is so not true.
No, I think this argument is largely correct. And frankly settled: anyone who's driven recent FSD beta versions knows very well that the cars "see just fine". They don't hit anything, they see and avoid obstacles. Frankly they're much more observant than humans are, my car will twitch when pedestrians turn as if they're going to enter the road (where human drivers mostly don't notice, and if they do they ignore it). What problems still exist are in planning: things like sign reading, lane selection, etc... still need some work. But collision avoidance just isn't an issue. It isn't. The LIDAR folks were wrong, basically.
(I will admit though that I'm a little sad about the removal of the ultrasound sensors though. It's true the autonomy probably doesn't need them, but I really like having the chimes to guide parking and garage maneuvering.)
Only if you ignore times where intervention stopped it from hitting something, times where it did actually hit something, massive amounts of jitter and popping in the visual output, phantom braking, etc.
Unless of course "recent" means n+1 where n is the version that crashed into something.
Collision with bollard in Feb 2022: https://www.youtube.com/watch?v=sbSDsbDQjSU
attempts to plow through cyclist Feb 2022: https://www.youtube.com/watch?v=a5wkENwrp_k
almost crashes into tram (can't gauge speed or direction?) Jun 2022: https://www.youtube.com/watch?v=yxX4tDkSc_g
Crashes into curb Aug 2022: https://youtube.com/shorts/8Mh1GjejdsI
Phantom brake Sep 2022: https://www.youtube.com/shorts/5v6j_oL7S-g
Almost colliding with bridge pillar 2 weeks ago: https://www.youtube.com/watch?v=5CMYkDWaqn0
Crashes into various objects in testing 2 weeks ago: https://www.youtube.com/watch?v=yyDxqEzV5Zc
I think your mistake is thinking LiDAR exists to solve the happy day scenario. It doesn't.
Vision is sufficient for the majority of use cases. Where LiDAR comes into its own is in the edge cases because it almost guarantees accurate bounding box detection. Which is where vision is at its weakest.
So I want to know what does FSD do when it sees a billboard of a person or when it is seeing a new object for the first time.
This is far, far from settled at this point.
According to who? Tesla? Because Tesla has a vested interest in trying to prove that they're right even if they're obviously wrong. That's why they constantly try to downplay failures, software issues, device issues etc.
I'm very confused by the attempts to discredit the usefulness of LIDAR. It's another tool you can use to improve the accuracy of your model. Sure, you can use a screwdriver, flip it around and use it as a hammer. But if you need to deal with nails, it's better to grab a hammer instead.
As long as those pedestrians DO NOT actually enter the road after those turns, any "twitching" of your car in response is an ADDITIONAL SAFETY PROBLEM, because other drivers might notice the erratic movements of your car and do erratic things as well, which in the end might result in accidents that wouldn't have happened had your car not "twitched".
Especially "twitchy" AIs like that of your car might very well "re-twitch" on noticing your car doing small, but erratic and rapid changes in behavior, thereby initiating a "twitch escalation spiral".
It's like being driven around by a drunk person - the reaction happens loooooong after the action that causes it has started.
Humans can’t really turn senses off, so they have coffee when driving. Touch and hearing are quite important to “read the road”. Equilibrium too.
Tesla should aim for parking first. Teslas do poorly at self parking:
I think of parking and I'm reminded of "the camry dent"
https://duckduckgo.com/?q=the+camry+dent&iax=images&ia=image...
Ideally cars will be self-driving using only passive sensors - but I do think that Musk/Tesla completely missed the value of active sensors in training.
Tesla does use Lidar on a small number of test vehicles for assessing ground truth. However, they have built enough of a data pipeline and fleet data acquisition to use repeat clips to determine ground truth better than human labelers.
But why. Because LIDAR doesn't help much in general or because the Tesla engineers aren't good at using the sensor data?
Same with the manufacturing.
Sounds to me like Tesla can't handle complexity. And if they can't handle the complexity of manufacturing, they surely can't handle the complexity of full autonomous driving.
Is that really what the problem boils down to? Or how it was decided? Or are you just questioning a common meme that comes up in internet debates about car AI?
I'm not following the news, but I haven't seen any videos set in what Canada looks like 4 months per year.
For many (MANY) years airbags were fought by the auto industry even though people wanted them.
Adding more sensors slows his team now more than it improves system performance
I'll take his word on this. It is a lot of work to incorporate multiple sensors.
All necessary information is already in the pixel-space.
I hate to disagree with someone as distinguished as Karpathy, but this is simply not what I have observed from all of that data that we have access to. Given my knowledge of the various stacks deployed today, I would never ever ever get into a vehicle using a vision only stack and expect it to perform in some of the challenging environments encountered during testing.
The fact that (most) humans manage to drive around safely and successfully in current roads proves that the information needed exists in the pixel-space (not just current image, but say current + history). We don't yet have stacks that can successfully map everything needed from this information but I don't think Dr. Karpathy ever claimed that.
(I am not a principal engineer but a mere PhD student who argues daily with people on how RGB information is underappreciated and under utilized)
Doesn’t mean it’s better or easier
Given the progress of the FSD "beta" to date, and the fact that Andrej _left_ Tesla, I'd wager that he knows that this approach is a dead end, but he won't say that because he'd get himself in hot water with Elon.
Most tech startups have 10x+ more problems with the engineering part than the infrastructure/ops part.
Also this is one person's perspective from a large team. His answer might be biased because he's an engineer and I doubt his was the only voice in the debate.
No. He makes it clear that he is very convinced about it. There is no relativism, no weasel words or couching in maybes. He could be wrong, of course, but he believes in what he is saying.
The video starts with him reframing the question instead of answering it
The problem is that so far, Tesla has yet to demonstrate that the fleet _is_ sufficient. IMO, if the fleet was enough to get to L5 autonomous driving, then they would already be there.
Reading sensor data is not the same as feeding that data to a neural network and asking it to form a worldview composed of possibly conflicting sensor data streams(i.e. lidar vs vision vs ultrasonic).
You are somewhat correct that it is quite trivial to read sensor data. For many sensors, there is some work which needs to be done to denoise or cleanup the input data. That's not where the story ends, however.
So it seems like a totally ridiculous argument that ultrasonic sensors create some kind of data processing overload.
An ultrasonic sensor makes it possible to implement incredibly simple and reliable safety features with well known performance characteristics. Processing an image with ML to produce the same effect has tons of edge cases where it might not work, and nobody knows when it won't work, and every update to the system could introduce regressions.
It's why they had to disable certain features when they got rid of the ultrasonic sensors. Those features may come back some day, but I bet they'll never be as reliable, and certainly won't be as predictable.
https://www.pcmag.com/news/tesla-removes-ultrasonic-sensors-...
And even with all these advantages, tens of thousands of people are killed in car crashes every year. Some people make a compelling argument that this is evidence that human vision doesn't have all the information you need for driving. While I don't go that far, I do think autonomous driving has a long way to go.
I use far more than just vision driving:
- sound, for emergency vehicles, detecting vehicles outside of my field of view if my windows are down or the vehicle is loud, tire sound (especially in snow and rain), engine sound (more feedback in snow or ice about what my tires are doing)
- touch (steering feedback, gives information about grip in some circumstances)
- acceleration (can feel if the rear tires break loose in a turn on snow or ice, or if I’m sliding while breaking)
And probably many more
But on second thought this doesn't bother me that much because Tesla FSD is absolute garbage even with radar (and I don't think Tesla will get away with selling the FSD snake oil for much longer), so if vision-only is good enough for the base-level lane-keeping autopilot functionality and it makes the cars cheaper, maybe that's a good thing.
This isn't like Facebook continually releasing a product that sucks but people will use anyway.
Tesla is constantly working against the clock and everything they do has real world consequences. There are multiple gov agencies watching over it at all times. Of course there's lots of people with far higher risk tolerance than is being exhibited but if it does turn out badly IRL this will get shut down pretty quickly.
The good news is Tesla has the ability to cripple this feature remotely without a costly/lengthy recall if that does happen.
Regulators have keen noses for very particular types of issues and rely heavily on manufacturer judgements on a lot of the rest. Issues that aren't in any of those fairly narrow categories need to be extremely public or extremely egregious to attract their notice.
Elon removed the radar and ultrasonics for the simple fact that its supply chain logjam was screwing up the manufacturing schedule. They also realized that the profit margin can be sustained in an inflationary environment by simply removing these parts [1]. “Oh, we were going to remove them anyway because humans can see fine with just eyes and no radar, why can’t cars?” Tesla then turned up the marketing of the AI/vision hype lever once more to toss another shiny tech object and get buyers to ignore the fact that there is a regression of features in the newer cars going forward.
- Costs money: the physical sensors (a dozen of them), wiring it up, assembling it, maintain inventory, code it, etc.
- Time spent on maintaining, improving software stack for the non-vision sensors as well as efforts needed to fuse the data with vision, takes away from focusing on vision alone. It also holds back vision in relevant areas.
- Existing non-vision sensors used by Tesla are orders of magnitude lower fidelity than vision. It has historically (as the case with radar) led to vision essentially having to overriding radar because vision just performed much better (see AI day 2021).
My take:
As with any new tech, it likely sucks at the start (think HDD and SSDs, and how a mechanical thing with lots of moving parts was way more reliable than SSDs at the start). However, by essentially moving past the local maxima, you get to innovate better, faster in the future.
In case of ultrasonic sensors, they are for low speed cases anyway and most people are fine without them. Majority of fatalities and injuries happen at higher speeds.
Used to be that Tesla was blazing a trail and if you wanted a good EV, that was what you got. Now, if you want the best EV, it's usually not going to be a Tesla. And I don't see that they're making any decisions that will regain them that title. The incumbent manufacturers are quickly proceeding to eat their lunch, just like many of us predicted would happen. Turns out the hard part of making a successful car isn't the drivetrain.
Would love to hear what you consider "good" and what specific EV ticks the most good features that a Tesla Model 3 doesn't.
they have a non-smooth capability curve, where they can demonstrate proficiency in activities that in regular computer programs or people would imply a complete and continuous path of capability that has been mastered to achieve the demonstration, but ai systems are weird in that can do amazing things, but have loads of little holes and failure modes along the way.
for example: gpt-3 can write you a shell script that will emit a c program that prints a poem about people you know, but will fail at very basic logic, sometimes.
in light of that, having additional support data like radar or lidar seems like the right move for plugging all those little holes in capability that turn up in real ai systems.
because at the end of the day, when you're driving a car in the real world and lives are at stake, simply interpolating or averaging over uncertainty seems awfully deadly and the only way to ameliorate that uncertainty seems to have multiple redundant sensory systems that can stand in for each other as conditions change. just like us!
So how would this work for parking?
A: Add more cameras so there are no dead areas in front of the car
B: build a model in vector space when driving towards a parking spot and assume blind spots don't change. (still sucks)
Think about how a human driver does it, given his/her even worse vantage point. They model what's in front/behind the car from afar and remember what's where as they approach it. There are other signals as well, such as continuation of a kerb, etc.
I think people keep forgetting that Teslas run hundreds of ML prediction tasks all the time. Watch recent AI day and their talks about "occupancy network" to get a sense of the car's ability to:
1. Construct 3D model of its surrounding in real time; 2. Remember occluded sections based on what's it's seen previously.
Where are you getting that from? Tesla has always seemed pie in the sky, and hardly a down to earth company at all throughout the history.
I'm basing this one both their public record, and reputation within the auto industry.
But the biggest thing that comes to mind is what happens at night. Are they only going to enable self-driving during the day?
Snow and ice may be another challenge but night sounds easy.
I use them as well!
Cost cutting.
Seriously? This is a major technical challenge?
Tesla doesn’t know how to do change management.
"We removed them because they cost money. And we are trying to make money ... at least right now.
Listen, this pure autonomous self-driving car stuff is never going to work, so who cares if we have these gadgets or not ..."
It's a Doppler radar, so you don't get any info from things stationary relative to the radar, but you do get range and range rate. And the quality of that data is independent of distance. We used it mainly as a backup system for the world model built with LIDAR and (to a very limited extent) vision. The VORAD data could lower the speed limit for the rest of the system, and if a collision was about to happen, it would slam on the brakes independently of the world model.
The big problem with coarse automotive radar is that it can detect targets, but doesn't tell you much about them. Cars, trash cans, and metal road debris all look about the same. There's also a lot of trouble from big flat metal surfaces being mirrors for radar. We were willing to accept slowing down for ambiguous cases until the other sensors could get a good look. Drivers hate that if road-oriented systems do it.
Modern units are up around 70-80GHz and often have 2D scanning, which is a big help. I haven't seen the output from a modern automotive radar. I was expecting that by now, low cost millimeter microwave systems (200-300GHz) would be available, providing detailed images somewhat coarser than you can get with light. You get range and range rate, and you can usually steer the beam electronically rather than mechanically. The technology exists to get high-resolution radar images, but is mostly used for scanning people for weapons at checkpoints. It hasn't become cheap yet.
Presumably this is a matter of working out if you are at a local maximum or not, and thinking about what properties the ideal solution will have. It also matters if you have other competitors that might be racing towards the ideal solution faster than you, potentially patenting their progress along the way.
Radar/Lidar/Ultrasonic is going to give you information that your camera systems will not give you. It does not matter if the delta of information is little. If this little is required because you can't obtain it otherwise, you still need it.
If you just rely on the fleet, you rely on the things you have seen. What about the objects that you have not yet seen?
a. Windshields that clean the inside as well as the outside.
b. Better eyeglasses[5].
c. User controllable hi-res HUD thermal IR overlay.
d. Headlights with adaptive notch filters so the oncoming vehicle can pick an empty spectral range... without the source being monochromatic (with required adaptive filters on the recieving end)... and/or really good coronagraph's.
e. Brake control[6].
Any entity capable of driving[7] in a population of humans (including adversarial humans) is sentient[8], and has real skin in the game. It would be unethical to lock one in a car:
[1] https://news.ycombinator.com/item?id=33213860 (analog FPGA)
[2] https://news.ycombinator.com/item?id=21106367 (general AI)
[3] https://news.ycombinator.com/item?id=16646112 (2018)
[4] https://www.tesla.com/blog/all-our-patent-are-belong-you (2014)
[5] https://patents.google.com/patent/US7744217 (2007)
[6] https://news.ycombinator.com/item?id=18013388 (2018)
[7] no human behind the wheel, no human to correct impending mistakes, but (critically) with one or more humans in the car.
[8] The idea that non-biological machines can have 'self' is a window into modern mass transformation. Please checkout the analog FPGA experiments linked above.
I didn't like his line of logic about how vision is necessary and sufficient, because that's how humans drive. Okay sure, but if some combinations of non-human sensors could drive better and/or cheaper than a vision only driving system, surely he would not argue for sticking with vision only? Maybe adding non-vision sensors lets you save hardware and software resources on the vision part of the system.