Rivian software update bricks infotainment system, fix not obvious (opens in new tab)

(electrek.co)

277 pointscarlivar2y ago379 comments

379 comments

I built a whole remote software update mechanism for a control binary that ran on 25k+ servers across multiple data centers.

Rest assured that after the first time I messed it up (which required ssh into each box individually), I wrote a lot of unit and integration tests to make sure that it never failed to deploy again. One of the integration tests ensured that the app started up and could always go through the internal auto update process. This ran in CI and would fail the build if it didn't pass.

While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

foobiekr2y ago

Rivian is an embedded use case, though, which is not at all like a fleet of servers.

Having worked for companies that produce network devices - including devices that are unreachable for example for 6 months of the year - and on software installation and upgrade, I am baffled how this bricking is possible. For one thing, you generally use some kind of confirmed boot mechanism - you upgrade a standby partition, set an ephemeral boot value that causes device to boot the alternate image, and reboot - only when the image is declared "up" does that get persisted (and then the alternate is upgraded, in order to prevent rollback in the event of a media error). You use watchdogs that are tied to actual forward progress (and not just some demon that the kernel schedules and bangs on the watchdog even if the rest of the system is hung) and if they fail, the WD reboots you. (This is one of the reasons that event driven programming is somewhat preferred - actually processing events from a single dispatch thread makes it easier to reason about the system.)

On top of that, you make sure that the core system is an immutable filesystem so that you can validate the _offline_ alternate image before rebooting (write-and-read-back-uncached) and periodically scrub the alternate image (same).

Like.. this is all embedded 101, stuff people have been widely doing since the mid 1990s and I think I can find examples going back to the 70s. Sometimes you get a little more sophisticated (allow sub-packages or overlays and use a manifest to check the ensemble instead of just a single image), but it's very standard.

dcow2y ago

Assuming Rivian does know embedded 101, my guess is that the infotainment system is running Android and the watchdog reported all green once the system services all came online and that it doesn't actually check whether the application layer is really working because, as you know, that would require the watchdog to run a full regression suite before giving the okay, which isn’t practical. Since the update swapped the system to an internal dev cert, they cant push an immediate update to change the boot args because the management plane daemon won’t connect to the C&C server, or it can but the blob they push wouldn’t pass signature validation, or the TEE won’t unlock the device keys because the roots changed. Whatever the case, someone has to go blow a fuse and re-flash the thing, or at least rewrite the boot args via serial. Just a guess.

If it is the most likely “management plane TLS certs” issue, I bet the watchdog won’t confirm the new boot args until the command dispatch daemon gets a pong from the C&C server moving forward (:

ikiris2y ago

That sounds out of scope for the MVP. We can worry about redundancies later after we ship.

roland352y ago

Hey now, preventing SEVs doesn't lead to impact. If we all collectively let this become a raging dumpster fire we can all heroically fix it and greatly exceed expectations for the half.

2 more replies

KingMachiavelli2y ago

Did you just use standard Yocto or similar tools to build such images? Are there standard daemons for managing hardware watchdogs (besides systemd since that's too simple as you say)? I think there's a lot of niche knowledge in the embedded space and many programmers are used to cloud systems and at most target. The most embedded experience most programmers have is likely iOS/Android development where all of the actual embedded concerns are handled for you. Even Google (soft)bricked a bunch of phones with the latest Android 14 update [1].

IMO there's not a lot of regular OSS for building embedded systems that comes with A/B partitioning, watchdogs, secure and verified boot - it's all custom at every org and tailored for individual products.

[1] https://arstechnica.com/gadgets/2023/11/android-14-patches-r...

MarkSweep2y ago

I quit my job before I got to deploy this, but RAUC looked like it would handle this for Yocto:

https://github.com/rauc/rauc https://github.com/rauc/meta-rauc

For microcontrollers, Memfault had a good article:

https://interrupt.memfault.com/blog/device-firmware-update-c...

neuralRiot2y ago

> including devices that are unreachable for example for 6 months of the year

That made me think, imagine NASA bricking up the voyager with a SW update.

aaronbeekay2y ago

As somebody currently working at an automaker on software systems, the amazing thing to me is that a mess up of this level doesn’t happen weekly. It’s rough out here.

jacquesm2y ago

Thank you. At least you're honest about it, the other day someone was trying real hard to convince me that software developers at automakers are made of magic fairy dust.

kalleboo2y ago

I'm amazed anyone would argue that after the Toyota firmware analysis.

1 more reply

bozhark2y ago

What's the priority then, telemetry data? Why is it rough out there?

jacquesm2y ago

Relatively crappy pay, complex toolchains, long build times, layer upon layer of (really bad) legacy code, badly specified (if they're specified) protocols between subsystems, subsystems that are completely opaque (no source code provided), homegrown OS's or older RTOS's, subset-of-C to keep it safe(r), tricky debugging environments and if you're really unlucky anemic hardware.

I hope I didn't miss anything but I wouldn't be surprised if I did.

1 more reply

reactordev2y ago

Probably due to fires, failures, and fatigue.

1 more reply

foobiekr2y ago

do you guys not have confirmed boot and swizzling to fallback images?

AlotOfReading2y ago

Automotive varies widely between "basically modern Linux systems with proper updates" and the most janky, home-grown update systems imaginable, sometimes even within the same components and teams.

1 more reply

cjbprime2y ago

> This ran in CI and would fail the build if it didn't pass.

I don't mean to be pedantic, but since we're talking about what should happen instead, this is insufficient. It works until the day you realize you made some kind of manual change to your CI infra, or that CI has some non-standard configuration that makes it work for you but not some significant fraction of the fleet.

People should do what you described in CI, but as well as that, you need phased rollout, where e.g. the build can only be rolled out to the next percentage point of randomly selected users in a specific segment (e.g. each hardware revision and country as independent segments) after meeting a ratio of successful check-ins, in the field, from the new build by production customers in that segment. That's the actual metric for proceeding with the rollout: actual customers are successfully checking in from the new version of the software.

Except, that's actually not sufficient either. What if the new build is good, but it contains an update to the updater which bricks the updater? Now you're getting successful check-ins from the new version in the field, but none of those customers will ever successfully auto-update again. So, test the new updater's ability to go forwards successfully, too.

quailfarmer2y ago

A good way to handle the who-updates-the-updater issue is to use a triple partition updater. A updates B, and then B updates C, then C updates A. If anything about the new version prevents it from properly updating its neighbor, that neighbor won't be able to close the loop, and you'll fall back to A. This simplifies the FSBL, because it just boots the three partitions in a loop, no failure detection required. You don't need to triplicate the full application either, just the minimum system needed to perform an update, and then have the "application" in it's own partition to be called by the updater.

latchkey2y ago

> It works until the day you realize you made some kind of manual change to your CI infra, or that CI has some non-standard configuration that makes it work for you but not some significant fraction of the fleet.

Nah, my CI process was solid. This was proven in the field over the course of years.

> I don't mean to be pedantic... you need phased rollout

You don't need to be pedantic, but better to ask the question rather than assume that was all that I did. =) You have to realize that what I built, worked flawlessly. It wasn't easy either, took a lot of trial and error.

I did have a CIDR based rollout. I could specify down to the individual box that it would run a specific version. Or I could write "latest" to always keep certain boxes running on the latest build. This was another part of my testing, but ended up not being fully necessary because I had enough automated testing in CI that "latest" always worked.

> but it contains an update to the updater which bricks the updater?

This happened, so I wrote a lot of test code to make sure that would never happen again. My CI would catch that since I was E2E testing that it could actually run the upgrade process.

Once I implemented all of this, I never had a single failure and would routinely, several times a day, deploy to the entire cluster, over the course of a couple years.

It was all eventually consistent as I could also control the "check for update" frequency as well.

cjbprime2y ago

I think there's a minor confusion here, where you think the purpose of my response involves doubting whether your system was successful. I understand it was successful. My response is to the sense in which your comment can be interpreted as advice to other people on what they should build.

I think the fact that you were able to survive with CI-only doesn't mean that we should encourage others to skip implementing a phased rollout based on verified customer successes, including testing of their new updaters before the first time they accidentally brick all the updaters, rather than afterwards. That's what I was hoping to help avoid, through my comment.

jacquesm2y ago

And you need to verify the vehicle is not in motion.

psychlops2y ago

Having worked on 25K machines, I can assure you that it never deployed to every single machine and failed to do so in interesting ways all the time.

latchkey2y ago

It always deployed. It was eventually consistent. Any failure would automatically be resolved after a period of time.

psychlops2y ago

Interesting. At any point in time, I had errors from hardware, software and networking. Even the racks would be getting overwhelmed at certain times. Simply being able to ssh into every host wasn't guaranteed. I'm not sure how you did it.

1 more reply

postalrat2y ago

As a frontend web developer I'm constantly deploying software to many thousands of machines. And you know what? It's pretty damn simple.

drdaeman2y ago

I used to wear your shoes in IE6/7 ages (no longer, I gave up during the "framework of the week" race and went all-backend), and it wasn't simple at all. Browser compatibility with all their rendering nuances, individual system oddities and all sort of fragile stuff.

And fortunately, no one bats an eye at a slightly broken site, but everyone hates even a slightly broken vehicle.

jrumbut2y ago

It's simple because we tolerate certain limitations in the web platform.

If you had a hard requirement that a page load could never take more than 100ms, regardless of network conditions, you'd have quite a challenge on your hands.

1 more reply

onion2k2y ago

I'm not really a frontend dev any more but I was for a long time. I can assure you that the only reason you think your code works is because no one tells you it's broken. If you use an error logging or telemetry service (Sentry, Rollbar, New Relic, etc) you will be aware that errors happen in frontend code all the time. It's just that most of the time bugs don't crash the app, and the user doesn't know what to expect so they see a broken feature and think it's meant to be like that.

uw_rob2y ago

I don't think it's fair to consider the updaters for either Chrome or the OS to be simple.

donmcronald2y ago

> While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

I feel like it's going to happen to someone that makes network devices eventually. I'm always scared to update my (several hundred) UniFi devices. Their update process isn't foolproof and they push auto-updates via the UI pretty hard.

Several years ago they caused some people's devices to disconnect from the management controller when they enabled 'https' communication. Prior to that, if you were pointing devices at 'https://example.com:8080...' they would ignore the 'https' part and do an 'http' request to port '8080'. Then they pushed their 'https' update which expected an 'https' connection and didn't fall back to the old behavior for anyone that was mistakenly using 'https' in their URL initially. Some people on their forums complained about having to manually SSH to every device to fix the issue.

It was caused by an end-user mistake, but they knew it was a potential issue. AFAIK, their attitude on it hasn't changed and a lot and at the time their response was that they knew it would break some people, but that it wouldn't be that many (lol).

IMO, the issue with those systems is that basic communication back to the update / config server is part of the total package which is too complex (ie: a full Debian install). I'd rather see something like Mender (mender.io) where the core communications / updates come from a hardened system with watchdog, recovery, rollback logic.

Think of how crazy it is to have something like pfSense doing package based updates rather than slice based updates. At least with boot environments they could add some watchdog and rollback type logic, but it'll still be part of the total system instead of something like a hardened slice based setup where the most critical logic is isolated from everything else and treated like a princess.

Do you have any insight on package vs slice based systems for updates? Did you isolate update logic from the rest of the system or am I out of touch with that opinion?

vGPU2y ago

Reminds me of my (far less critical) update process for home assistant. Every time something breaks. Currently my hvac automations are going haywire.

akira25012y ago

When possible, I used a fail back mechanism. If the update failed to fully come up, then the watchdog timer would catch it, the bootloader would notice the incomplete boot, and attempt to boot from the previous known working image in that case.

code_runner2y ago

out of morbid curiosity.... how long did it take to ssh into and fix all of those servers? I imagine even automating a fix (if possible) would still take a good amount of time.

latchkey2y ago

gnu parallel and sshpass is your friend.

The way I built my app was that I could install it cleanly via a curl | bash.

So, I just had a simple shell script that iterated through the list of IP addresses (from the DHCP leases), ran curl | bash and that cleaned up the mess pretty quickly.

jdechko2y ago

As a non-developer, the whole situation with a bad software update to the Voyager spacecraft really puts things into perspective as far as how bad remote updates can be.

It’s also a testament to the way that the system was designed that they were able to get it back online.

sixtram2y ago

you ssh-d into 25K servers one by one? I mean, manually?

latchkey2y ago

https://news.ycombinator.com/item?id=38270986

ugh1232y ago

Please tell me you scripted that ssh into across your 25k servers!

latchkey2y ago

https://news.ycombinator.com/item?id=38270986

One thing my little control process did on the box was to always set the password to be the same... user/1.

None of these boxes needed inbound connections, so it wasn't a big deal to do that.

gravitronic2y ago

I used to work for a company that built satellite receivers that would be installed in all sorts of weird remote environments in order to pull radio or tv from satellite and rebroadcast locally.

If we pushed a broken update it might mean someone from the radio company would have to make a trip to go pull the device and send it to us physically.

Our upgrader did not run as root, but one time we had to move a file as root.. so I had to figure out a way to exploit our machine reliably from a local user, gain root, and move the file out of the way. We'd then deploy this over the satellite head end and N remote units would receive and run the upgrade autonomously. Fun stuff.

Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

singleshot_2y ago

The person who built and released this might not have ever worked for your company, which might be why no one remembers building or releasing it.

gravitronic2y ago

No no, I figured that out afterwards, in a past development iteration someone added it on purpose and then forgot all about it - "oh yeah we needed that to <solve some mundane problem>".

So... worse than subterfuge? That being said it only listened on the local socket, so it's slightly less bad, and I don't want to get into the myriad of correct ways that original problem could have been solved, but lets just say that company doesn't exist anymore.

cjbprime2y ago

I admire your restraint in writing this comment. :)

ThePowerOfFuet2y ago

This is one of the very finest comments I have ever seen on HN (or anywhere else, for that matter).

nomel2y ago

> Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

No offense, but what a shit show. It makes me assume no source control, and a really good chance that state actors made their way into your network/product. This almost happened at a communication startup I know, with three letter agencies helping resolve it. State actors really like infiltrating communication stuffs.

gravitronic2y ago

oh, yeah, this place was a total shit show. BUT we were ISO9001 certified!! So we had source control (CVS) and a Process (with a capital P) to follow. In this case that code was added in a previous development iteration because someone needed to run something as root when a user pressed a certain button on the LCD panel in front and this was the decoupled solution they wrote intentionally. Somehow I feel like that makes it worse than if it was a malicious three letter agency lol.

nomel2y ago

Package this up and send it to https://thedailywtf.com

It's beautiful.

qmarchi2y ago

It's crazy to me that this is possible in the first place. Standard practice is to have a fleet of test vehicles that are effectively production except in an early release group.

Or, you know, having an A/B boot partition scheme with a watchdog. Things that have been around for decades at this point.

Disclaimer: Former Googler, Worked closely with Automotive.

michaelt2y ago

To me it's all-too-understandable how this is possible.

Maybe they've got a test fleet, but it accepts code signed with the test build key.

Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

I wouldn't be surprised if, as a newer company, they'd made a 'Minimum Viable Product' secure boot setup & release procedure, and the auto-fallback and fat-finger-protection were waiting to get to the top of the backlog.

qmarchi2y ago

So, using Polestar as a reference as it's both a vehicle that I've worked on, and one that I personally drive.

> Maybe they've got a test fleet, but it accepts code signed with the test build key.

Polestar solves this by only delivering signed updates to their vehicles. The vehicle headunit will refuse to flash a partition that isn't signed by the private key held by Polestar. Pulls double duty to prevent someone from flashing a malicious update, as well as corruption detection.

> Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Based on what the Rivian reports are showing (Speedometer, cameras, safety systems are working), they likely are running their infotainment as a "virtual machine" within their systems. Again, something that Polestar does.

Implementation of a watchdog with a "sub-system" like this is relatively braindead simple.

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

Generally, A/B partitioning is part of the bootloader, the first program that executes after the reset (on many modern processors) pin is released. This also leads to reboot counters and such being stored as part of the NVRAM that is available at boot.

Opinion: Maybe I'm biased, but maybe if you can't develop something yourself, there's reason for you to get an off the shelf option that handles a lot of these things.

Disclaimer: Former Googler, Worked closely with Automotive.

Gud2y ago

To be honest, I don't think Polestar set a very high bar for software quality. I am currently renting a Polestar 2 from Hertz, and sometimes the HUD doesn't work(it's 50/50 if it will turn on). That means, I don't see speed, battery charge, etc, while driving. Infotainment system is working though.

2 more replies

refulgentis2y ago

Opinion: I'm a little confused as to how you're confused as to how test might not match prod sometimes.

Observation: "[if you write buggy software], there's reason for you to get an off the shelf option"

Question: Are you saying if they used Android Automotive this could never have happened?

Reference: similar event for Android, last week: https://linustechtips.com/topic/1538248-pixel-phones-using-m...

Disclaimer: Former Googler, did not Work closely with Automotive, Worked closely with Android.

2 more replies

paledot2y ago

> The vehicle headunit will refuse to flash a partition that isn't signed by the private key held by Polestar. Pulls double duty to prevent someone from flashing a malicious update, as well as corruption detection.

And of course preventing people from modifying and controlling hardware that they own, having paid 6 figures for (in the case of the Polestar 3 anyway). But that's table stakes for embedded systems in this day and age. Security for me, not for thee.

LoganDark2y ago

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

You do not report a successful boot until and unless the entire system loads up successfully. You will definitely have writable storage by then.

psunavy032y ago

Exhibit A of why a Minimum Viable Product still needs a proper Definition of Done which includes quality standards.

worik2y ago

What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

Owners should have to bring the vehicle into a shop to have changes made, and they should be very rare.

This lazy, control freakery of the worst kind

Something very bad is going on happen and people will die before we realize that it is a stupid dangerous practice

qmarchi2y ago

I understand the sentiment, but think about the alternatives.

There are a few different kinds of updates that can be applied, each with their own protective layers.

Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

Then there's system updates, which is where danger noodle things happen. Automotive manufacturers are significantly more risk averse to updating these components, and generally, if _anything_ within the system looks wonky, it's an immediate revert.

If I, as a Polestar owner, wanted to get an update for my vehicle, the nearest service center is 1.5h away. If I lived in Montana (United States), it would be realistically impossible for me to update my car. Thus, if we want to enable competition within the markets, we shouldn't have regulations that force a new manufacturer to have a global network just to add CarPlay to a screen.

mulmen2y ago

Dad has a 1966 Oldsmobile with air conditioning. In the last 57 years General Motors has never found a need to update the switch. It still works flawlessly.

It’s stupid that we invented a way to not only remotely break an on/off switch but also a culture that rolls the dice on that until the inevitable happens.

neoromantique2y ago

>Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

Also speedometer, which is hardly a convenience feature.

2 more replies

LastMuel2y ago

On the other hand, we update irreplaceable spacecraft billions of miles away with new software.

It should be fine to push software updates out, as long as the correct safety and fallback procedures are in place. It simply has to be designed to handle failure and procedures need to be in place to mitigate risks.

It sounds like that wasn't the case here. Also, why wouldn't you have a small initial release pool when you have such a large potential for disruption?

brewdad2y ago

NASA's philosophy is the polar opposite of "move fast and break things".

1 more reply

worik2y ago

If Ford or Nissan want to invest as much in ms as NASA does in Voyager....

bradleyjg2y ago

The art of shipping software—like on a disk, where once it’s out the door, it’s out the door and you may never get another shot—is dead or dying. Even in some embedded areas of the industry now.

fargle2y ago

> What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

What amazes me is that any grown up person thinks it is a good idea to update telephones as if they were software and not phones.

Or rather that it is a good idea to have phones that need updates? Either way, we're all one 1/2 assed push update to a fridge, vacuum, washing machine, phone or car away from a really annoying day.

worik2y ago

Toucè

vore2y ago

As the update only affects infotainment and not critical systems, it seems like a reasonable tradeoff to me. Just because a car can fail in ways that kill people doesn't mean all parts of a car are equally critical.

windexh8er2y ago

This isn't true. If you look at the release notes for any of Rivian's updates they all include vehicle related firmware changes. This is not simply infotainment.

Beyond that "infotainment" includes driver critical information - like the speedometer which, for many affected, means there's no working driver screen.

1 more reply

spaceywilly2y ago

Yeah... I worked on an embedded project with literally 2 engineers, and we had an A/B partitioning scheme, and a recovery partition (we fully qualified the recovery image and it was flashed to the units on day 1, it was guaranteed to boot and it would just sit and wait for the user to initiate a firmware load). The app on the device would reset a U-boot variable once it was successfully loaded, so U-boot could check the number of failed boot attempts. If it was >= 5 reboot attempts without booting successfully, it would go into the recovery partition.

There's really no excuse from Rivian on this, this is shoddy

LargeTomato2y ago

I interviewed at Rivian. They told me about how they needed to grant users access to things like keys, AC, ignition, etc. So they built a hierarchical, recursive group checking IAM system.

That just felt like a massive product to build and maintain for what really could have been backed by AWS iam. GCP IAM if they really really needed hierarchy. I guess I'm not surprised at this outage.

DannyBee2y ago

Rivian does have a test fleet, and they test it for weeks before releasing. This particular issue is because they apparently distributed the firmware signed with the wrong cert.

Not a bug in the software itself.

That is independent of testing the software, but still a distribution issue.

mytailorisrich2y ago

My 2c based on your comment:

* "signed with the wrong cert" should mean the software package is rejected before it it is installed.

* software upgrades are tricky and there should be at least 2 versions available so that fallback to the previous is possible and automatic in case of issues.

DannyBee2y ago

The software package probably is signed right but contains multiple signed binaries of which one is signed wrong.

Or is multiple signed packages and one is wrong.

Or the test cars accept prod and test certs.

Or some combo of the above.

There are lots of ways this could have broken that doesn't amount to rivian not being able to write software

1 more reply

jandrese2y ago

Yeah, but how did the vehicle not just reject the wrong cert and refuse to flash the update?

mkipper2y ago

I've never worked in automotive but it's pretty easy to imagine how this might play out in a car, where a single update might bundle updates for several programmable devices.

It's easy to imagine a central SoC receiving the update, verifying its signature against a local key and then reprogramming some MCU over an internal interface. But then after resetting the MCU, you realize that the image you just flashed isn't compatible with the boot security keys burned into that MCU. It's not uncommon for a device performing the OTA update to not have access to the "source of truth" keys / certificates used to verify the updated image at boot time.

Not that this is a great excuse. If you add OTA updates to a product that has this design, you should really be confident in your recovery solution.

DannyBee2y ago

The firmware is probably not just a signed package but signed binaries in the package as well. One is probably signed with the wrong cert.

This would not cause the updater to fail unless it verified the certs of all the binaries in the package, which most don't

mlyle2y ago

The code went through early release tests successfully; the problem came with how it was more broadly released.

They should have had further staging of the rollout (randomizing when it is offered to users).

whalesalad2y ago

A/B partitions tends to solve that. You will only switch to the new partition when the update is 100% verified installed. If it doesn't complete in an atomic manner, your device will just boot into the previous healthy partition.

AlotOfReading2y ago

A/B gets complicated in the real world. BL1 may not support A/B for example, so to implement A/B bootloaders you may need a shim that can read/write NVM to handle that. Your HSM may not have slots for multiple keys to have different signatures, so upgrading one may trample the other if your update code doesn't check that.

Lots of ways to screw this up, especially in automotive where you're likely to be dealing with TI and their (in)secure boot.

I've solved this problem god only knows how many times now and I've rarely found an automotive board that doesn't introduce fun, new edge cases. OTA can't exceed x kilobytes of memory, the processor isn't fast enough to verify signatures and write the image in < x seconds, can't write the image to flash unless the signature is verified, but the image doesn't fit in RAM, the server delivering the update is 3+ networks away from the device receiving the update, etc.

1 more reply

mlyle2y ago

If the comment I replied to originally contained a mention of A/B partitions, I missed it.

MichaelZuo2y ago

The 'early release tests' weren't testing an identical copy of the actual update?

DannyBee2y ago

It's probably closer to:

The test vehicles accept test/prod signed versions

Regular vehicles only accept prod signed versions

They are otherwise identical.

The test vehicles were sent test signed versions

The prod vehicles were sent the exact same update, signed with test.

This would not be uncommon since the test vehicles probably occasionally run test releases for debugging.

Further, the update is probably multiple signed pieces, and the only part accidentally signed with test was likely infotainment software.

Or something like this.

It's hard to believe they wouldn't test sending badly signed updates, so i have to imagine it's a particularly weird badly signed update.

In other words, i would not assume they are idiots.

1 more reply

hef198982y ago

I am still not sure why I would update software on car, a piece of hardware that, IMHO, shoupd be able to run air gapped 24/7. Exceptions: recurring bugs, GPS maps and security updates. All of which can be done either during service (preferred, if they brick it, they are liable) or by plugging in something. OTA updates just seem completely pointless...

Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

enragedcacti2y ago

Rivian consistently ships a lot of new features and improvements, you can see the changelogs here [1]. I think you can pretty fairly critique a lot of them with: They are just solving a problem that they created by making it too techy, or they are shipping stuff they should have completed before releasing the product. I do think its hard to argue that the updates aren't adding anything of value though. There's convenience stuff like pet mode or bird's eye camera view that were added after release, but there are also things like new driving modes (soft sand and snow) or improved DC charging curves and smarter battery conditioning that legitimately improve the quality of the product as a vehicle.

> Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

As for this, the entertainment system can control basically every feature of the car and is often the primary or only way to accomplish certain things. Even in much much dumber cars the infotainment is still part of the CAN bus and is able to interact with the rest of the vehicle.

https://rivian.software/category/public/

1 more reply

jacquesm2y ago

You may not have a say in the matter. Most cars are on some kind of IoT private network with their own cell modems and if the manufacturer decides to push an update there isn't much you can do about it short of ripping out the cell modem. Which may well have unpredictable consequences.

refulgentis2y ago

Rollouts don't solve problems, they limit who they effect.

mlyle2y ago

Is not reducing the effective cost of a bad update by 10x or more worthwhile?

Sure, but if you are rolling out to 1% of users per hour, you detect the problem in a couple of hours and much fewer than 2% of users will have applied the update. This is a relatively small support problem.

While if you roll out to everyone at once, you'll detect the problem sooner (within an hour) but have 10x as many affected.

xyst2y ago

When a car company is losing money on every car sale. C level execs going to cut corners

dewski2y ago

This is a bad take.

xyst2y ago

Rivian layoffs earlier this year [2] combined with reports of $33K loss per sale [1]. Rivian is hemorrhaging money right now.

RVN IPO’d at $150/share. Now it’s trading at $16/share.

All of these indicators of poor leadership to me. No sustainability. Burning cash. Poor company outlook. Poor products.

[1] https://tfltruck.com/2023/10/rivian-financial-results-losses...

[2] https://www.theverge.com/2023/2/1/23581642/rivian-layoff-ev-...

OneLeggedCat2y ago

This is an inadequate comment.

worik2y ago

Why?

cs7022y ago

It's easy to underestimate how hard and expensive it is to build, deploy, and remotely upgrade software that runs reliably on a fleet of diverse cars (different models, different years, slightly different components from batch to batch, etc.). It makes updating a mobile phone OS look trivial in comparison.

So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

VyseofArcadia2y ago

Forget updates entirely. My car is one of the few places I expect to get software that works the first time.

If you absolutely must have updates, then at least not OTA updates. Have them done at the dealership or service center so any issues can be dealt with immediately.

Come on, is this engineering or hacking? This is a car, not a CRUD app. Get. It. Right.

dagmx2y ago

That’s how things used to be and it resulted in lots of long standing bugs because the update rates were low, and so manufacturers didn’t push updates. Many people don’t live near dealers or service centers or can afford the continued cost (it’s not free usually unless it’s a recall)

OTA is better for consumer when done properly. Other manufacturers manage it fine, and one bad example shouldn’t be what we base things on. It’s what we should learn from and improve on.

jacquesm2y ago

But average quality was a bit higher because nobody thought 'oh, we'll fix that next week in the OTA update'.

1 more reply

dalyons2y ago

eh i guess i disagree. We had that (& still do for some cars) for decades, and it universally resulted in terrible software that you were stuck with for the life of the car. Hard to update == hard to iterate == bad software.

ClumsyPilot2y ago

bad software is the one that kills people. Ugly software that works is fine

1 more reply

w0m2y ago

random new features via OTA updates was one of the deciding factors when i bought my car ... :)

I also mostly WFH so... yea. lol.

matrss2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

I am pretty sure there is a market for a dumb modern car, but no one is building it. I am thinking of an electric car without anything "smart" in it. Modern safety features can stay, if they work completely self contained and without requiring an external connection ever over the lifespan of the car.

iso8859-12y ago

I wonder if it is somehow possible to use an open source battery management system to build a car like this. See https://foxbms.org/

jacquesm2y ago

Regulatory pressure may well get you to do stuff you wouldn't want to do.

NotYourLawyer2y ago

I’d buy that car today.

wannacboatmovie2y ago

This isn't a bunch of Windows PCs home-built from a hodgepodge of components.

They designed, built, and shipped all the hardware. There is ABSOLUTELY NO excuse for not having a database of the exact hardware configs by serial number. They have the ability to test every single shipped configuration.

If they don't, they have already failed as a car company.

AlotOfReading2y ago

I guarantee they have a database with the hardware configs. It's required by NHTSA to do recalls and notices. They'll undoubtedly be using that to inform the right people to come in.

The update servers almost certainly don't talk to that system though.

wil4212y ago

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

My Jeep Grand Cherokee has OTA for over 5+ years. BMW has been doing it since 2018.

I’m almost positive a family member had it with GMC on star back in the late 2000s.

willio582y ago

I don't think the Jeep or BMW infotainment systems are nearly as fleshed out or complex as Rivian's, especially not Tesla's. Maybe I'm wrong!

phpisthebest2y ago

Well then we need to ask why is their infotainment systems so complex? and does it need to be?

I want my infotainment systems go connect to Android Auto. That is is.

Make it do that, and only that.

This drive to make EV's as complex as possible is one of the reason i am not planning on buying one

EV's are suppose to be SIMPLER than ICE. Make me a Simple Car with simple controls, and just replace the ICE with a battery and Electric Motor, give me an app for my Phone that can do the Charging Trip Calculators and interface with other systems.

I do not want a compplex SaaS app on wheels

2 more replies

wil4212y ago

That’s a huge plus for me. CarPlay or nothing. BMW is becoming closer to a Tesla like screen. GM is supposed to drop CarPlay in favor of whatever they are doing on their EVs.

I don’t want my ICE/EV to become a SaaS app where I’m paying $500 a year to use my own car.

1 more reply

jacquesm2y ago

BMW in particular is an interesting case, a (late) friend of mine drove just about every model the day after it came out (BMW fan) and they spent more time in the shop for software issues than they did driving. To the point that he'd get attached to some of the loaners, it really was that bad.

duped2y ago

Updating software is orthogonal to the complexity of the software application being updated, unless you have horribly designed your architecture. I know, because I've made that mistake.

bri3d2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

Almost all automotive control modules have firmware, whether that firmware is parsing touchscreen inputs or a rotary encoder.

NotYourLawyer2y ago

Well sure, but the rotary encoder can’t get moved to a different menu tree by a software update, and I can use it without taking my eyes off the road. I know which I prefer.

xyst2y ago

Tech junk shouldn’t go in cars, period. Cars shouldn’t be as pervasive and prevalent in society (at least in USA). Yet here we are. Car manufacturers have spent an insane amount of money over decades to get to this point (buying legislators, forcing highway infra, subsidies, profit driven strategy over sustainability)

treesknees2y ago

Decades? Try almost a century. For better or worse, our cities and various economies were built around the automobile.

It's still a free market - these companies could choose not to put tech into their product. But look at the backlash against GM when they announced they wouldn't support Apple Car Play or Android Auto. Consumers want it.

FireBeyond2y ago

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

Tesla, whose computer systems quite regularly need to be hard rebooted while the car is driving? That Tesla?

code_runner2y ago

I had to do this once or twice (its very very infrequent in my experience) and one time it was genuinely terrifying, as I had lost blinkers etc where a few interstates all intersect and merge etc.

I still do love the car though.... but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

xienze2y ago

> I had to do this once or twice (its very very infrequent in my experience)

This is something that’s _never_ supposed to happen.

> but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

How on earth can you rationalize a Tesla performing an update/hard reset while driving as _your_ fault? It should never be allowed to happen!

2 more replies

Aurornis2y ago

> (different models, different years, slightly different components from batch to batch, etc.). It makes updating a mobile phone OS look trivial in comparison.

Not really. Vehicle computers aren’t vastly different on every model year and every trim level or option package. These parts are standardized, tested, and carried across model years.

Even with changes, the teams would be expected to have the different variants in their development and test cycles. The 2020, 2021, and 2022 model infotainment systems likely share a lot more in common than an iPhone 13, iPhone 14, and iPhone 15 with all of the non-Pro, Pro, and Max variants.

etchalon2y ago

My Volvo XC90 gets regular OTA updates without issue, and so did my Land Rover Discovery before it.

tech_ken2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

If it ain't broke it's ripe for disruption

dylan6042y ago

if (cpu == A) do code

else if (cpu == B) do other code

They invited the multiple combination vampire into their house. They know what devices are being used. If you don't want a dedicated update per piece of equipment, it'll be a large binary with lots of branching. Saying they don't know what device is where is just lazy. Ask the device what it is, and have a branch for it. If the device IDs itself as something unknown, don't do anything.

xgbi2y ago

From what I read somewhere, Tesla was able to do that because they have remote ssh capability.

In at least one instance, they fixed the cars manually by running a massive remote command on all cars after a messed up update: https://lobste.rs/s/v42zil/former_tesla_employee_ssh_d_as_ma...

I wouldn’t call that very reliable , but they indeed do it regularly

FireBeyond2y ago

And it's not like they'd ever abuse that ability, like when someone pokes around in their car and discovers references to a new unannounced model, and then Tesla reaches in, force downgrades the vehicle to older software with no references, and then disables the ethernet port on the vehicle, and for a final fuck you disables its ability to ever get another update.

They'd never do that, except when they did do that.

kccqzy2y ago

The Twitter thread linked by the link posted GP actually contained a reasonable explanation of why that happened. https://x.com/atomicthumbs/status/1032939644621545473

SoftTalker2y ago

It sounds like, in this case, the updates clobbered the ssh authorized keys (or equivalent in their system) and so now they cannot access the cars remotely. So they are going to have to go into the shop and have the authorized keys restored.

scardycat2y ago

Bringing CI/CD mindset to cars is probably not a great idea. Software updates to commuter vehicles should have a high bar for operational standards, and a simple thing such as an expired certificate should have never been deployed. Having isolated networks in vehicles helps but doesn't prevent broken updates from, eventually, bricking the cars.

nomel2y ago

I think this shows more of a fundamental flaw in their update mechanism, than anything.

I don't think a botched update is a big deal. It happens, and should be expected, in a sane design. The fact that the customer noticed is a big deal.

There are many implementations that could be used for an "auto rollback" feature. They either failed to implement that in a sane way, or they were goobers, and assumed things would always be rosy.

babypuncher2y ago

I would be pretty pissed if I went out to my garage to head to work one morning and found that a damn software update bricked my car overnight. This shouldn't even be a thing, why does a car need regular software updates to keep functioning?

bink2y ago

It doesn't need regular updates to keep functioning. It offers regular updates as they add new features. For instance, in this update a new feature was added to allow for proximity locking at home but disable proximity unlocking. That would lessen the number of times the car would lock and unlock accidentally as you walk in and out of the garage. No one was forced to install the update.

theandrewbailey2y ago

Cars 20 years ago, even most of them 10 years ago, never got any updates unless they got recalled. Nothing broke, nothing got hacked, and most are probably still working fine.

What happened to cars today? I refuse to believe that it's solely because these are electric cars, as if the way the car stores and uses energy dictates that it must be part of the internet of things.

Edit: there were electric cars over 100 years ago. I bet they never got software updates.

3 more replies

jdminhbg2y ago

There's never been any car that 100% will work in the morning when you go to the garage. It's all tradeoffs.

1 more reply

BuckRogers2y ago

It doesn’t. People and these tech companies are tools. And do it largely in search of ways to take more of your money. It’s not a favor.

1 more reply

gitfan862y ago

The Tesla update is slow probably for this reason. It is probably verifying that it can rollback at any point of failure.

liminalsunset2y ago

I believe one of the reasons it is slow is because it is also updating the firmware on any number of connected ECUs over the CAN bus. This typically means the image has to be sent over a 500kbit/s bus so there is a limit to how long it has to take.

yjftsjthsd-h2y ago

I would naively expect it to just do A/B updates, which unless I'm forgetting something shouldn't incur a speed penalty? (Other than that the update doesn't get applied until restart)

1234letshaveatw2y ago

From a few days back- Its software has been a “key differentiator” https://electrek.co/2023/11/10/rivian-using-software-to-scal... kind of humorous in hindsight

wannacboatmovie2y ago

Interesting to note that Ford's approach of updating software is far more conservative and car-like. It can be done fully offline via USB, but requests that you kindly upload the log files written to the memory stick back to them when complete, in the instructions as a necessary step. Presumably so they can track and stop incidents like this before they happen fleet-wide.

Rivian seems more like a "ship it and we'll fix it in the next sprint!" company.

How do other manufacturers handle updates?

post_break2y ago

Fords approach is flawed however. You can still update sync with a bad update and bork it over usb. Ask me how I know.

r00fus2y ago

Pray tell, how painful was your discovery?

post_break2y ago

Thankfully I was able to repackage it again and flash an upgrade and it unborked itself.

sturza2y ago

A/B partitions

barryrandall2y ago

The last time I built something like that, it used partition 1 for the current version, 1 for the last version, 1 with the as-shipped version, and 1 that could restore A or B from the internet or USB.

reneberlin2y ago

When will humans be crazy enough to update the firmware of artificial hearts OTA?

Updating cars with new features OTA, even "just" an Infotainment can possibly cost lives, because the driver might get confused and isn't putting eyes on the streets.

It should be forbidden and every change should be made clear to the driver, shown in detail, and should need verification twice before being accepted. There must not be any kind of surprise in a car for the driver.

It should even be possible to skip an update or stop updating at all.

rekoil2y ago

Not updating cars OTA (yes, even "just" the infotainment) can potentially cost lives as well, as security holes would not get patched until the next service appointment.

qudat2y ago

What a nightmare. This is where software engineering meets "real" engineering, where a "bug" has potentially life threatening consequences.

nomel2y ago

> where a "bug" has potentially life threatening consequences.

What are you referring to? That is not relevant to this story, and would require a deep understanding of the system to make such a claim of negligence.

“The issue impacts the infotainment system. In most cases, the rest of the vehicle systems are still operational ...”

Also, you can't do an update while driving.

jawns2y ago

Based on the photo included in the article, what they're calling an infotainment system is actually two separate components, one of which appears to be taking the place of a traditional dashboard. If that's the case and there's no other way to monitor speed, fuel levels, engine temperature, warning lights, etc., I'd say that's quite a bit more worrisome than just not being able to play your favorite music while driving.

nomel2y ago

I understand, but the risk of life wouldn't be from the bug, it would be from conscious choice of driving without a speedo. There's a critical distinction there.

In this case, mileage/battery are still present, and I would assume safety critical warnings would still be displayed.

1 more reply

ct02y ago

You've never been to death valley without air conditioning Or Russia without heat. I think the infotainment system in this case has a broken climate control function. There are workarounds, but why if you don't have your phone?

nomel2y ago

This is a great point. I would claim that it would be a bad choice to initiate the update (it's a manual process, requiring intent) when you're in these conditions. But, a less tech savvy person may not understand that updates can be risky, and give it a shot at a remote charging station.

BlueTemplar2y ago

I have been talking recently to someone whose job involves sometimes driving to other continents, and they mentioned that cars more recent than ~2003 were out of the question because outside of the EU you cannot expect random mechanics to have the computers required to interface with the car's computers - required for repairs.

1 more reply

qudat2y ago

> What are you referring to?

Not the specifics of this article, but more generally about the gravity of the situation car makers (and their software engineers) operate under. The very idea that an OTA software update that causes a bug within more critical features of a car could be life threatening. So my point isn't about the specifics of this particular bug, rather the capacity for a bug that could kill.

nunez2y ago

critical safety systems/functions appear to be unaffected by this outage.

nicholasjarnold2y ago

Is it possible, as a licensee of the Rivian vehicle system, to disable the automatic OTA updates without having expert-level knowledge or tooling?

Also, yes, I'm specifically avoiding using the word "owner" above for obvious reasons.

558734452161112y ago

Rivian "licensee" here. So far all updates have required you to press a button (in the car or on the app) to launch the update installer. Not sure how many weeks you can ignore it for as I never tried.

bo10242y ago

Confirming that updates are not automatic, and can be ignored indefinitely. For now.

martin84122y ago

Stuff like this is why I don't want OTA updates in my cars. Let the car dealership deal with it during regular maintenance. They'll be on the hook for fixing it before handing the car back to me.

galangalalgol2y ago

Don't even need OTA. A seattle radio station bricked a bunch of mazdas.

https://www.autoblog.com/2022/02/09/seattle-radio-station-br...

cozzyd2y ago

Amazing. Can't wait for some car software stack to be so poorly designed that an FM transmitter can remote takeover.

galangalalgol2y ago

It quite possibly could have with a well formed digital fm payload.

1 more reply

evanelias2y ago

Regardless of OTA vs dealership-only updates, software bugs can have problematic effects long after the update occurred.

So far I've had to take my Chevy Bolt to the dealership twice due to major software problems causing the "service needed" indicator to be lit (equivalent to "check engine"), and I've owned it for barely over a year.

The first time, some random bug made the car think there was something wrong with the transmission under some extremely specific set of circumstances, and as a safety precaution it would refuse to shift into drive if not serviced within 100 key cycles.

The second time, it was a bug with the software that manages battery health making the car think the battery had a severe problem. In that situation, as a safety precaution, the car refuses to charge above 40%, disables regenerative breaking, limits the HVAC usage, and slightly limits max acceleration.

This is getting very irritating. I bought an EV because I thought it would require fewer maintenance visits to the dealer!

SoftTalker2y ago

Also worth considering that a manufacturer like Rivian is pretty small. Every town has a Ford dealer. There are many states, however, that don't even have a single Rivian service center.

0xffff22y ago

Coverage is even thinner than I would have guessed. California has 6 Rivian service centers, but they're strongly clustered in the Bay area and LA/OC/SAN. Even in California if you live in Fresno/Bakersfield/Santa Barbara etc you're looking at several hours round trip to visit an official service center.

ben_jones2y ago

But what if your car doesn’t have the latest emojiset or social sharing functionality within the notes app?

dilyevsky2y ago

There is nearly zero regular maintenance to be done on EVs though. No oil, no belts, no fuel filter, spark plugs etc. Even the brakes will likely last entire lifetime of the car

martin84122y ago

Bullshit. BEVs eat through tires because they're heavy. The air cabin filter and pollen filter need frequent replacement. In the case of Tesla, you better check the undercarriage so you can potentially spot control arms soon to fail. The shield for the battery should be inspected.

dilyevsky2y ago

You can’t rotate/change tires at local tire shop? It’s not under warranty anyway.

> The air cabin filter and pollen filter need frequent replacement

Yeah “regularly” like every two years lol, you gon wait for that long to update your software and pay $500 to a dealer to do it?

vmladenov2y ago

BEVs don't eat through tires because they're heavy. They eat through tires because they have fast acceleration and their drivers make use of said acceleration. A Tesla Model 3 weighs around what a BMW M4 weighs and less than all trims of an M5.

wannacboatmovie2y ago

EVs should be subject to mandatory German-style inspection by law to counteract this delusion.

dilyevsky2y ago

Hey if you want to pay a dealer 300-500 euro to inspect your tires and swap air filter I’m not against that - you do you. Also if you buy a german car (ev or not) then yes a mandatory inspection is warranted

rconti2y ago

Just EVs?

nkingsy2y ago

Hertz seemed to find teslas cost double ice counterparts to maintain.

Maybe it’s auto company smoke but source: https://fortune.com/2023/10/27/tesla-elon-musk-hertz-evs-ren...

aftbit2y ago

I'm not sure maintenance costs are really the relevant part. It seems like the problem is that Teslas are cheaper now and thus Hertz's fleet is worth less than it was before. Additionally, they find that Teslas suffer more damage, likely from collisions or similar. Routine maintenance costs are not mentioned in that article at all.

>In short, the declining value of the Tesla cars in Hertz’s fleet—a decline directly caused by Musk’s price cuts—has hit Hertz squarely in its profits.

>Without explaining precisely why, Scherr said Hertz is suffering a higher incidence of damage specifically with its EV fleet, where the repair costs are roughly double that of a comparable gas-fueled car.

>“Studies of current EV ownership evidence lower incidence of damage and collision than for ICE vehicles, not higher as we are experiencing,” he revealed. Musk’s price cuts then become an acute problem when one of the Hertz EVs sustains so much damage that the cost of repair is more than the asset itself.

>“Where a car is salvaged, we must crystallize at once any difference between our carrying value and the market value of that car,” Scherr explained. “The [price] declines in EVs over the course of 2023, driven primarily by Tesla, have driven the fair market value of our EVs lower as compared to last year, such that a salvage creates a larger loss and, therefore, greater burden.”

>In short, Hertz then needs to book a noncash accounting charge. Together with the higher repair costs this led to significant profit margin headwinds.

https://archive.ph/leFdf

1 more reply

dilyevsky2y ago

That’s because it is run by idiots who ran it into bankruptcy

gunapologist992y ago

This is why I don't really want my car to have any antenna (that receives/interprets code) or receive OTA updates, ever.

I'd like to please force any attackers to at least be within 50 feet of my TPMS, instead of being literally anywhere on the planet.

A car doesn't need data updates, and definitely not code updates[1]

1. source: every car built in previous century.

sbehere2y ago

> A car doesn't need data updates, and definitely not code updates

I don't think this is accurate. Many advanced driving assistance capabilities need access to updated map tiles, which is a data update. They may need code updates to fix errors or shortcomings that can be detected only after deployment on extensive fleets or in response to changes to the environment/infrastructure. This is just one example for why data and code updates are needed.

I think it is more accurate to say that a "dumb" car with mostly electro-mechanical systems doesn't need data updates and definitely not code updates. But that isn't true for vehicles built within the last few years and definitely untrue for vehicles that will be built in the coming years.

gunapologist992y ago

> Many advanced driving assistance capabilities need access to updated map tiles

Your phone (or GPS or even a paper map) can guide you; none of the following need access to map tiles:

* forward collision warning

* automatic emergency braking

* lane departure warning

* adaptive cruise control

* blind spot detection

* stability control

> code updates to fix errors or shortcomings

That's what recalls and TSBs have traditionally been for, and the driver can refuse them if desired. I mean, actual lives are at stake here. Would we (or should we) allow 737's to get OTA updates? Of course not. The target is too valuable and surface area too vast to adequately protect it.

pard682y ago

My insistence on only driving cars made prior to 2005 keeps making more and more sense.

(2005 is just an arbitrary date I settled on, nothing significant about it)

1 more reply

eschneider2y ago

This is a bit of a nightmare scenario and why when remote updating, you always test update to your own fleet first. Always.

toddmorey2y ago

It sounds like it was tested on their own fleet but they accidentally pushed the wrong bits when deploying the update more widely out to customers.

eschneider2y ago

The usual "best practice" thing for IoT deploys, is to deploy to "your" devices, what for everyone to go green, then allow that build to deploy more widely. In a well-functioning system, it shouldn't be possible to swap bits between those stages.

But who knows what these guys were doing. :/

bink2y ago

Reading between the lines of their public comments it sounds like they did run a full test through their test fleet, their employees, and then were rolling out to customers when the promotion process was "fat fingered". Maybe someone accidentally promoted the wrong release version.

carlivarOP2y ago

Maybe they should have an additional phase between test deploy and customers such as "employee personal vehicles".

kevin_nisbet2y ago

Yes. And also things like rolling out the update in batches, and then also things like golden images, where if there are two crashes or failures in the first 24 hours of the update, change to the last known good software version.

ralmidani2y ago

Move fast and break things that move fast…

I don’t really like or trust most (if not all) of the established automakers, but there is something to be said for having several decades (over a century in some cases) of experience building potential killing machines vs. a company that’s not even 15 years old. The established players have put out cars which suffered freak malfunctions, but Rivian (and Tesla) seem to be struggling more with QA.

Non-rhetorical question: do companies have safeguards for critical components like braking systems, or are they also prone to catastrophic failure if a software engineer pushes a bad commit?

ezfe2y ago

The moving fast components were unaffected by this issue…

ralmidani2y ago

I know, I just thought it was a decent pun.

baz002y ago

This is why I have a Dumbcar connected to a Smartphone via bluetooth.

toddmorey2y ago

Just a counterpoint: my dumb car has been undrivable way more often than my electric car.

They never deployed bad software updates but they sure have designed & deployed bad fuel pumps.

In some ways it’s all engineering and quality control.

akira25012y ago

Which is also why there is a huge non-OEM market for those types of parts. Can you even replace the rivian "infotainment" system?

baz002y ago

I can go nearly anywhere and replace the fuel pump though.

barbazoo2y ago

That's why _I_ have a Dumbcar connected to a smartphone via FM for audio only :)

thatguy09002y ago

I've tried that before, but it sounded terrible. What dongle do you use?

bonton892y ago

These all sounded like crap to me as well. I've heard earlier models had much higher radio power (that was possibly illegal) and worked better. They were an archaic tech by the time I was buying the adapters off amazon.

barbazoo2y ago

I mean, totally, it doesn't sound great but that's all I can do, my car doesn't have bluetooth :)

1 more reply

Lightbody2y ago

I have preorders in for the R1S, the Volvo EX90, and the Kia EV9. I passed once already on buying the R1S when they had one in town available for immediate purchase, simply because they refuse to adopt CarPlay.

This incident does NOT give me confidence that Rivian is likely to offer a better alternative to CarPlay, despite their statements otherwise.

I suspect the EX90 will be what I land on eventually.

p_j_w2y ago

>This incident does NOT give me confidence that Rivian is likely to offer a better alternative to CarPlat,

I have complete faith that, 5 and maybe even 10 years from now, no auto maker will have delivered anything that can compete with either CarPlay or Android Auto. The fact that an auto maker thinks they can do better is a sign of a really high level of either arrogance or outright greed. Complete deal breaker.

AndrewKemendo2y ago

Whomever makes the first affordable, tight tolerance electric car that doesn’t spy on you and doesn’t need special care will win the market

samsquire2y ago

This is actually a topic that I think about from time to time: how to do aggressive changes to software while they are running. In Ruby world you have monkeypatching. And Linux kernel has livepatching.

For example, if you have a distributed system and you want to upgrade a component that every caller uses: you have a large exercise on your hands where you might have to roll out a change over time and then clean up your incremental branches where you have to handle two control flow paths through the code. It reminds me of Google's protobuf required field discussions.

It reminds me of repository-per-microservice and a Java library that other microservices use and updating a dependency and having to deploy the change to every service.

It's like trying to change wheels on a car while the car is moving or refueling a jet in flight.

Unison lang is trying to solve this problem I think, by allowing multiple versions of a function to be available.

https://www.unison-lang.org/

Migrations in databases are painful too.

One solution I've thought of which is probably overengineered is that API call sites are an abstract object and their schema and arguments is centrally deployed, I called this "protocol manager".

The idea is you write all your code to use a "span" and have contextual data in a span, and you can include or exclude data in a span with a non-software rollout. Your communication schema of RPC and API calls is a runtime decided thing, not hardcoded.

If you have N deployed versions of code and you want to upgrade to X, you have to test 1..N to X versions. So nobody does that.

fabianlindfors2y ago

The database aspect of this problem is particularly interesting to me. I’ve previously built Reshape [0], a zero-downtime migration tool for Postgres, and am now working on ReshapeDB [1], a full database designed from the ground up to tackle this problem.

[0] https://github.com/fabianlindfors/reshape [1] https://reshapedb.com

jbott2y ago

You might be interested in learning about Erlang – it supports hot code reloads natively: https://oozou.com/blog/understanding-elixir-otp-applications...

Someone12342y ago

I wonder if the way Microsoft's XBox is designed may be something to look towards in terms of hardware reliability/fallback. Specifically they utilize a Hypervisor which rarely needs updates, running different operating environments which need frequent updates.

- Better isolation of different parts of the system (e.g. infotainment unit, instrument cluster, et al).

- Better isolation for updates (e.g. run a "beta" update, and a "stable" update side-by-side).

- Automatic error detection and rollback (e.g. if a VM keeps restarting after an update).

- Ease of offering features like rollbacks to end-users.

- Rare hypervisor updates can be held to a much higher standard relative to other VM updates.

The only downside of hypervisor-based systems is slightly higher hardware costs. But even that is largely mitigated by modern architectures that natively support virtualization.

PS - You can also look to any containerization. I specifically brought up the XBox because it is a hardware product, just like a vehicle.

kevinventullo2y ago

My 2019 car is not connected to the internet. Instead, I use Apple CarPlay for everything.

Is there any reason not to do it this way?

aftbit2y ago

My cars are from 2019 and 2001. I don't use CarPlay or any internet features in any of them. Instead, I just use my Android's screen itself for navigation and bluetooth for phone calls and music.

Perhaps there are advantages to tighter integration with my car (at least the newer one) but IMO they are outweighed by the risks of things like this, or even just getting a software update that borks a small feature that I like.

lotsofpulp2y ago

CarPlay and Android Auto has no risk like this. It is great because it is just a protocol for your phone to be able to use the screen in the car. Your data stays on your phone, and there is no risk of intrusion.

kevinventullo2y ago

CarPlay is not an internet feature. It’s basically a screen mirror, and I plug my phone in via USB.

ChadMoran2y ago

Quite a few reasons but primarily CarPlay doesn't know about your vehicle's charge and road trip navigation requires external inputs to determine your charge route.

Apple could potentially offer an API to have "reverse" CarPlay where the car's app can feed information into iOS. I recently rented an Mercedes EV which had Apple CarPlay and it was a weird experience having to manage two sets of experiences.

callalex2y ago

The exact API you describe already exists in CarPlay.

ChadMoran2y ago

Today I Learned... that's great to hear!

kyleee2y ago

Manufacturers should be required to expose an agreed upon spec API that provides range estimate, battery state, etc. So CarPlay and other apps can access the info in a standardized way

callalex2y ago

Apple holds up their end of the deal by supporting the EV Routing feature in maps. It’s entirely up to manufacturers to expose the information at this point.

ChadMoran2y ago

100%. I've owned Teslas since 2017 and always found the infotainment to be very good. Though I've really preferred Apple Maps routing recently. The descriptions of when to turn are much more human.

darknavi2y ago

Well for one Rivian (and Tesla) don't support CarPlay or Android Auto.

kevinventullo2y ago

Sounds more like a reason not to buy Rivian and Tesla to me

callalex2y ago

More potential for anticompetitive vendor lock-in (sugar-coated as differentiation), and more opportunities to profiteer from stalking behavior (sugar-coated as telemetry to improve user experience).

gmane2y ago

Then automakers can’t sell your information for fat stacks of cash.

pookha2y ago

..And authoritarian governments can't take complete custody of your movements. Can't shutdown your car either. Seems to be a win-win for both sides (private and public industry).

https://qz.com/1522309/how-chinas-electric-car-surveillance-...

https://apnews.com/article/4a749a4211904784826b45e812cff4ca

lotsofpulp2y ago

This, and automakers do not want to become dumb appliances (even though they are for 90% of people).

antoniuschan992y ago

Wondering why there isn’t an option for a factory reset (eg. press and hold with a paperclip for 10 seconds)

1970-01-012y ago

Lexus did the very same thing about 8 years ago:

https://www.consumerreports.org/lexus/what-to-do-if-your-lex...

sarchertech2y ago

Miku baby monitors deployed an automatic firmware update that bricked nearly every monitor in use, but not for nearly a month after the update.

It forced the company into bankruptcy because they had to replace all of them.

fsckboy2y ago

I wish the economics of mass production didn't turn pennies into millions that need to be eliminated, because I've always thought the "don't disconnect from power" and "update bricks it" type problems could be solved by having extra EPROM to download into, the way linux keeps the previous kernel around after an update.

Or at least the ability to re-init/download from scratch, like a borked macbook disk. And hey, not the extra ability to do that, make it "the way it works" so you're always testing it.

wnevets2y ago

This maybe crazy but if you're writing software for hardware that cost tens of thousands of dollars it should be impossible to brick it with an update, especially if that update is OTA.

The future is going great.

thumbsup-_-2y ago

This is the new world we will be living in where you enter your car, only to find that something is broken because of OTA. While updates causing some bugs is ok in my phone but I don't want any bugs in my car. What happens if it messes up with safety systems? or what happens if OTA breaks my car that is out of warranty now? May be I'm the only one that is missing stable software in cars that once vetted, just keeps running as-is if nothing around it is ever changed (ideal scenario for an offline car).

teeray2y ago

An interesting thought experiment: what happens when these vehicles are out of warranty, and automakers accidentally send a vehicle-bricking OTA update? Isn’t that property damage?

jacquesm2y ago

This has happened to some Apple hardware, they fixed it for free in some cases but stiffed others:

https://discussions.apple.com/thread/253315438

With the mandatory mobile phone updates for a few years you're definitely going to see a lot more cases like that.

A thread about Tesla directly related to your question:

https://teslamotorsclub.com/tmc/threads/wholl-be-responsible...

karaterobot2y ago

What kinds of changes are generally included in these over the air updates? I have this sudden urge to shake my fist at a cloud and tell the gods that cars shouldn't need updates in the first place, if the car was ever deemed ready for production and then sold to customers for money. But, maybe I'm wrong, and it makes perfect sense. All I can think of would be something like a periodic update to navigation data, is that it?

ezfe2y ago

It’s possible to deem software ready to sell but find improvements later.

Simple example: my Subaru was sold to me with an interesting design decision that caused the radio to come on whenever the car was started. This was not a bug. Every Subaru worked this way for years. A year into ownership I received an OTA update that added a “not playing” state on startup.

This was never a safety issue and was likely not a defect. It was, however, stupid and needed to be changed.

karaterobot2y ago

I wish my Mazda had this option! But I would still say that I'd expect them to have included this option before selling the car, especially since radios and user preferences around radio UI are pretty well established.

ezfe2y ago

Sure, but they didn't - and it's not because the software is complicated. The Subaru headunits are very basic (and more-so a few years ago).

When they make an improvement, I like getting the trivial things on my older vehicle. It's better for everyone involved, so long as they do it responsible.

bfrog2y ago

It’s funny I was just talking to someone about a-b images slots and boots the other day and how they had written this test suite because there were so many potential places where partial updates could be interrupted.

Thousands of test points having to be verified was my understanding. That’s before even getting to the confirmed boot/watchdog aspect.

What a hassle, hope they like spending money on labor because it sounds like they are going to need to.

adolph2y ago

The vehicles are drivable but software and displays go black. It appears that the 2023.42 software update hangs at 90% on the vehicle screen or 50% on the app screen and then the vehicle screens black out. All systems appear to still work except for the displays.

This is what I do with my Prius to get a comfortably distraction-free driving environment. Sounds like a feature not a bug.

altairprime2y ago

Technically, the NTSB could order an immediate recall for all Rivian vehicles due to this issue, as the disabled defroster controls are a critical safety issue in cold and/or humid environments. Tesla was forced to issue a recall notice over the controls being buried in a menu; Rivian’s ”defroster unavailable during driving due to manufacturer error” is far worse — especially given the mass and torque of their vehicles, relative to unarmored road users.

sturza2y ago

Instrument cluster display going black is a functional safety/QM issue. No blinker, transmission direction, speed etc confirmations.

bri3d2y ago

It looks like they correctly isolated the safety critical components on the instrument cluster and they are still functional without infotainment: https://twitter.com/RivianSoftware/status/172443804967573962...

eigenvalue2y ago

Can’t imagine how much it would suck to be the engineer who fat fingered it and caused a huge crisis for the company, inconveniencing tons of customers and costing millions. Even if there should be processes in place to prevent it in the first place, you’d still know you were the “but for cause” of the problem.

nicolaslem2y ago

This is the kind of thing that keeps my awake at night.

Does anyone here have some practical tips to turn an embedded Linux machine into an appliance? The kind of system that a botched update cannot brick but only momentarily disable until a non-technical user presses a factory reset button of some sort.

elitepleb2y ago

A/B updates as implemented in android, https://bootlin.com/pub/conferences/2022/elce/opdenacker-imp...

hospitalJail2y ago

>Does anyone here have some practical tips to turn an embedded Linux machine into an appliance?

Lol

I suppose this is the negative about having sensors that make sure water gets hot enough to be sanitizing, but not so hot that it wastes energy. And I'm sure you can imagine 100 other uses of having a microcontroller/CPU process data and do feedback. (I'm sure there are EE only ways of doing it, but theoretically possible and useful are two different thigs)

nunez2y ago

/r/Rivian is a class act. I expected a wall of screaming, but instead entered a relatively calm room. People are upset, but there's no seething or flamewars, which is kind-of surprising given the cost of these trucks ($80k+, Range Rover territory).

M3L0NM4N2y ago

I think the reason is because they're $80k trucks, not $400/month Tesla leases. Also, they're first generation and I think most of the buyers understand that.

Havoc2y ago

> the vehicle is not bricked

What a time to be alive. Software updates (almost) turning cars into paper weights lol

ct02y ago

Will insurance carriers cover damages due to botched updates? Imagine 10 years from now the power/control that electric delivery companies would have over retailers like amazon. One botched update away from a complete backup for delivery vans.

cryptoegorophy2y ago

Tesla updates are sent in batches and you can opt in for advanced updates I guess to be earlier. Normally when I see that there is an update on Reddit then it takes 1-2 weeks at least to get to my car with the “advanced” updates on.

glonq2y ago

As somebody who has spent many years doing embedded+iot related to remote fleet firmware updates, this is the kind of thing that lurks in my nightmares.

I'd love to be a fly on the wall at Rivian engineering/operations this week!

easylion2y ago

need a easy way to do restore to previous version offline. take 100 bucks extra if required to have a backup ssd. Don’t want to be camping and then realizing i’m stuck because of some junior dev not being competent enough

seattle_spring2y ago

Why would you intentionally upgrade your vehicle software while camping? It’s not like this stuff installs automatically, you have to explicitly accept the installation. Waiting a few days or even a few weeks before hitting “install” is completely normal.

avereveard2y ago

> In most cases, the rest of the vehicle systems are still operational

Like what do you mean "in most cases" I can understand a broken infotainment needing reset but imagine if you had to tow your truck I'd be furious.

MisterTea2y ago

Can I please just buy a car with a motor and battery? Why does every god damn vehicle have to come littered with screens and chips all together like some tentacle monster?

All I need is a gauge cluster screen that can display the normal info like stored and heading while also letting me configure the cars performance and safety features. Then let me mount a double DIN radio that isn't dog shit. I've not seen a single new car with these dumb screens with a sound system that's not tinny muddy garbage with zero adjustment save for "bass" and "treble" settings. I mean all that technology and you can't be assed to put an eq in there. HVAC never needed more than two or three knobs anyway.

fhub2y ago

I'm going to have a chuckle next time I pass the Databricks billboard on 101 in San Francisco "Rivian powered by Databricks" or something to that tune.

WirelessGigabit2y ago

What's the impact on your insurance should you get into an accident?

The speedometer screen is gone, so does that not imply the vehicle is inherently unsafe to drive?

Am4TIfIsER0ppos2y ago

Look at all these commenters saying "code signing was done wrong" when the wrong part is code signing at all.

j452y ago

As long as they are good for fixing it, this might what being a Pioneer or Early Adopter is about.

emmelaich2y ago

Poor title; physical repair is not required. Physical presence is required.

Someone12342y ago

The article doesn't really state what is required to repair the vehicle. I'd assume if it was as simple as loading a flash drive and plugging it in, then Rivian would have provided a way for customers to self-fix. The second a single body panel is removed to gain access to the headunit, it is a physical repair.

So without more info we cannot know if it is accurate or not.

emmelaich2y ago

I don't think many people would consider removing a body panel to be a physical repair. I think the term is 'back to base' or similar.

Physical repair suggests e.g. a burnt out capacitor

immy2y ago

That’s funny, I just saw a job posting for Rivian Infotainment team

b200002y ago

“we use leetcode to filter out hires because it works for us”

whoopsie2y ago

Ah this is why CarPlay isn’t worth adding, right?

FireBeyond2y ago

As annoying as this, I find this laughable, too. Rivian updated users on the situation. Then, whines Electrek:

> That’s the last update we had over 10 hours after Rivian customer vehicles were fed the bad software update.

"Over 10 hours"!

I suppose it isn't Tesla, who yeets updates over the fence, that break new things, yeets another update that fixes that problem but introduces another one, then reverts back to two versions prior, before the issue. The Tesla that gets firmware fixes from vendors that have a test harness that should take 36+ hours to run, but says YOLO and flashes it onto a random car they have lying around and emails the vender back 3 hours later saying "LGTM, WFM, thanks!"

shoelessone2y ago

Honestly this makes me feel good, just because it always worries me that I don't see this type of issue being resolved more often. having to physically bring in a car seems like a near worse cast situation but it's good to keep this in our minds as a possibility.

sitzkrieg2y ago

i cant believe this sort of stuff is acceptable. what a clown industry

thrill2y ago

Inexcusable, really.

collsni2y ago

OTA on a car. What could go wrong?

janitor612y ago

This is tangentially Rivian related, but does anyone else see the inherent danger of stylized tail lights that are just a single red bar across the back of the car? Travelling on the freeway at night I can't really gauge the distance to the car in front of me if it's far ahead and if there's no discernable left and right brake lights. I'd believe Rivians and other cars like that are more at risk of high speed rear-end collisions.

rurp2y ago

This reminds of the terrible turn signals Mini used, which look like flashing arrows pointing in the opposite direction of the turn[0].

Getting cute with basic stuff like tail lights is forgettable or annoying at best, and absolutely can be dangerous.

[0]https://jalopnik.com/congratulations-mini-you-made-the-stupi...

xyst2y ago

Looks like this car brand is circling the drain. Glad I never bought into the hype.

seattle_spring2y ago

It’s circling the drain because of one bad software update?

Sounds more like you’ve just bought into the doom and gloom that a few specific news outlets have been pushing.

xyst2y ago

Tesla. Rivian. All cut from the same cloth. A car should be simple. Yet we are stuffing all of this tech junk into it and trying to repackage is as something else to pump the numbers.

Car companies suck at tech. Let’s be realistic. They should stay their lane and focus on improving the car and physical aspects (safety, reducing carbon output, longevity, ease of repairability, reducing supply chain issues)

bhauer2y ago

> Tesla. Rivian. All cut from the same cloth.

I'm not aware of any Tesla OTA updates bricking the infotainment system. At least since I've been paying attention. I don't see them quite as similar as you suggest.

margalabargala2y ago

There have been plenty.

https://www.reddit.com/r/TeslaLounge/comments/112oqln/new_te...

https://teslamotorsclub.com/tmc/threads/failed-software-upda...

Two examples of many.

I'm not aware of any fleet-wide issues that accidentally bricked Teslas, but as one-offs they do happen; and unlike this Rivian update, a botched Tesla OTA generally leaves the car undriveable and needing to be towed. These Rivians will at least still drive, as long as you don't need fancy extraneous luxury features like a...speedometer.

j / k navigate · click thread line to collapse

379 comments

latchkey2y ago

I built a whole remote software update mechanism for a control binary that ran on 25k+ servers across multiple data centers.

While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

foobiekr2y ago

Rivian is an embedded use case, though, which is not at all like a fleet of servers.

dcow2y ago

ikiris2y ago

That sounds out of scope for the MVP. We can worry about redundancies later after we ship.

roland352y ago

Hey now, preventing SEVs doesn't lead to impact. If we all collectively let this become a raging dumpster fire we can all heroically fix it and greatly exceed expectations for the half.

2 more replies

KingMachiavelli2y ago

[1] https://arstechnica.com/gadgets/2023/11/android-14-patches-r...

MarkSweep2y ago

I quit my job before I got to deploy this, but RAUC looked like it would handle this for Yocto:

https://github.com/rauc/rauc https://github.com/rauc/meta-rauc

For microcontrollers, Memfault had a good article:

https://interrupt.memfault.com/blog/device-firmware-update-c...

neuralRiot2y ago

> including devices that are unreachable for example for 6 months of the year

That made me think, imagine NASA bricking up the voyager with a SW update.

aaronbeekay2y ago

As somebody currently working at an automaker on software systems, the amazing thing to me is that a mess up of this level doesn’t happen weekly. It’s rough out here.

jacquesm2y ago

Thank you. At least you're honest about it, the other day someone was trying real hard to convince me that software developers at automakers are made of magic fairy dust.

kalleboo2y ago

I'm amazed anyone would argue that after the Toyota firmware analysis.

1 more reply

bozhark2y ago

What's the priority then, telemetry data? Why is it rough out there?

jacquesm2y ago

I hope I didn't miss anything but I wouldn't be surprised if I did.

1 more reply

reactordev2y ago

Probably due to fires, failures, and fatigue.

1 more reply

foobiekr2y ago

do you guys not have confirmed boot and swizzling to fallback images?

AlotOfReading2y ago

Automotive varies widely between "basically modern Linux systems with proper updates" and the most janky, home-grown update systems imaginable, sometimes even within the same components and teams.

1 more reply

cjbprime2y ago

> This ran in CI and would fail the build if it didn't pass.

quailfarmer2y ago

latchkey2y ago

Nah, my CI process was solid. This was proven in the field over the course of years.

> I don't mean to be pedantic... you need phased rollout

> but it contains an update to the updater which bricks the updater?

This happened, so I wrote a lot of test code to make sure that would never happen again. My CI would catch that since I was E2E testing that it could actually run the upgrade process.

Once I implemented all of this, I never had a single failure and would routinely, several times a day, deploy to the entire cluster, over the course of a couple years.

It was all eventually consistent as I could also control the "check for update" frequency as well.

cjbprime2y ago

jacquesm2y ago

And you need to verify the vehicle is not in motion.

psychlops2y ago

Having worked on 25K machines, I can assure you that it never deployed to every single machine and failed to do so in interesting ways all the time.

latchkey2y ago

It always deployed. It was eventually consistent. Any failure would automatically be resolved after a period of time.

psychlops2y ago

1 more reply

postalrat2y ago

As a frontend web developer I'm constantly deploying software to many thousands of machines. And you know what? It's pretty damn simple.

drdaeman2y ago

And fortunately, no one bats an eye at a slightly broken site, but everyone hates even a slightly broken vehicle.

jrumbut2y ago

It's simple because we tolerate certain limitations in the web platform.

If you had a hard requirement that a page load could never take more than 100ms, regardless of network conditions, you'd have quite a challenge on your hands.

1 more reply

onion2k2y ago

uw_rob2y ago

I don't think it's fair to consider the updaters for either Chrome or the OS to be simple.

donmcronald2y ago

> While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

Do you have any insight on package vs slice based systems for updates? Did you isolate update logic from the rest of the system or am I out of touch with that opinion?

vGPU2y ago

Reminds me of my (far less critical) update process for home assistant. Every time something breaks. Currently my hvac automations are going haywire.

akira25012y ago

code_runner2y ago

out of morbid curiosity.... how long did it take to ssh into and fix all of those servers? I imagine even automating a fix (if possible) would still take a good amount of time.

latchkey2y ago

gnu parallel and sshpass is your friend.

The way I built my app was that I could install it cleanly via a curl | bash.

So, I just had a simple shell script that iterated through the list of IP addresses (from the DHCP leases), ran curl | bash and that cleaned up the mess pretty quickly.

jdechko2y ago

As a non-developer, the whole situation with a bad software update to the Voyager spacecraft really puts things into perspective as far as how bad remote updates can be.

It’s also a testament to the way that the system was designed that they were able to get it back online.

sixtram2y ago

you ssh-d into 25K servers one by one? I mean, manually?

latchkey2y ago

https://news.ycombinator.com/item?id=38270986

ugh1232y ago

Please tell me you scripted that ssh into across your 25k servers!

latchkey2y ago

https://news.ycombinator.com/item?id=38270986

One thing my little control process did on the box was to always set the password to be the same... user/1.

None of these boxes needed inbound connections, so it wasn't a big deal to do that.

gravitronic2y ago

I used to work for a company that built satellite receivers that would be installed in all sorts of weird remote environments in order to pull radio or tv from satellite and rebroadcast locally.

If we pushed a broken update it might mean someone from the radio company would have to make a trip to go pull the device and send it to us physically.

Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

singleshot_2y ago

The person who built and released this might not have ever worked for your company, which might be why no one remembers building or releasing it.

gravitronic2y ago

No no, I figured that out afterwards, in a past development iteration someone added it on purpose and then forgot all about it - "oh yeah we needed that to <solve some mundane problem>".

cjbprime2y ago

I admire your restraint in writing this comment. :)

ThePowerOfFuet2y ago

This is one of the very finest comments I have ever seen on HN (or anywhere else, for that matter).

nomel2y ago

> Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

gravitronic2y ago

nomel2y ago

Package this up and send it to https://thedailywtf.com

It's beautiful.

qmarchi2y ago

It's crazy to me that this is possible in the first place. Standard practice is to have a fleet of test vehicles that are effectively production except in an early release group.

Or, you know, having an A/B boot partition scheme with a watchdog. Things that have been around for decades at this point.

Disclaimer: Former Googler, Worked closely with Automotive.

michaelt2y ago

To me it's all-too-understandable how this is possible.

Maybe they've got a test fleet, but it accepts code signed with the test build key.

Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

qmarchi2y ago

So, using Polestar as a reference as it's both a vehicle that I've worked on, and one that I personally drive.

> Maybe they've got a test fleet, but it accepts code signed with the test build key.

> Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Implementation of a watchdog with a "sub-system" like this is relatively braindead simple.

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

Opinion: Maybe I'm biased, but maybe if you can't develop something yourself, there's reason for you to get an off the shelf option that handles a lot of these things.

Disclaimer: Former Googler, Worked closely with Automotive.

Gud2y ago

2 more replies

refulgentis2y ago

Opinion: I'm a little confused as to how you're confused as to how test might not match prod sometimes.

Observation: "[if you write buggy software], there's reason for you to get an off the shelf option"

Question: Are you saying if they used Android Automotive this could never have happened?

Reference: similar event for Android, last week: https://linustechtips.com/topic/1538248-pixel-phones-using-m...

Disclaimer: Former Googler, did not Work closely with Automotive, Worked closely with Android.

2 more replies

paledot2y ago

LoganDark2y ago

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

You do not report a successful boot until and unless the entire system loads up successfully. You will definitely have writable storage by then.

psunavy032y ago

Exhibit A of why a Minimum Viable Product still needs a proper Definition of Done which includes quality standards.

worik2y ago

What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

Owners should have to bring the vehicle into a shop to have changes made, and they should be very rare.

This lazy, control freakery of the worst kind

Something very bad is going on happen and people will die before we realize that it is a stupid dangerous practice

qmarchi2y ago

I understand the sentiment, but think about the alternatives.

There are a few different kinds of updates that can be applied, each with their own protective layers.

Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

mulmen2y ago

Dad has a 1966 Oldsmobile with air conditioning. In the last 57 years General Motors has never found a need to update the switch. It still works flawlessly.

It’s stupid that we invented a way to not only remotely break an on/off switch but also a culture that rolls the dice on that until the inevitable happens.

neoromantique2y ago

>Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

Also speedometer, which is hardly a convenience feature.

2 more replies

LastMuel2y ago

On the other hand, we update irreplaceable spacecraft billions of miles away with new software.

It sounds like that wasn't the case here. Also, why wouldn't you have a small initial release pool when you have such a large potential for disruption?

brewdad2y ago

NASA's philosophy is the polar opposite of "move fast and break things".

1 more reply

worik2y ago

If Ford or Nissan want to invest as much in ms as NASA does in Voyager....

bradleyjg2y ago

fargle2y ago

> What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

What amazes me is that any grown up person thinks it is a good idea to update telephones as if they were software and not phones.

Or rather that it is a good idea to have phones that need updates? Either way, we're all one 1/2 assed push update to a fridge, vacuum, washing machine, phone or car away from a really annoying day.

worik2y ago

Toucè

vore2y ago

windexh8er2y ago

This isn't true. If you look at the release notes for any of Rivian's updates they all include vehicle related firmware changes. This is not simply infotainment.

Beyond that "infotainment" includes driver critical information - like the speedometer which, for many affected, means there's no working driver screen.

1 more reply

spaceywilly2y ago

There's really no excuse from Rivian on this, this is shoddy

LargeTomato2y ago

I interviewed at Rivian. They told me about how they needed to grant users access to things like keys, AC, ignition, etc. So they built a hierarchical, recursive group checking IAM system.

DannyBee2y ago

Rivian does have a test fleet, and they test it for weeks before releasing. This particular issue is because they apparently distributed the firmware signed with the wrong cert.

Not a bug in the software itself.

That is independent of testing the software, but still a distribution issue.

mytailorisrich2y ago

My 2c based on your comment:

* "signed with the wrong cert" should mean the software package is rejected before it it is installed.

* software upgrades are tricky and there should be at least 2 versions available so that fallback to the previous is possible and automatic in case of issues.

DannyBee2y ago

The software package probably is signed right but contains multiple signed binaries of which one is signed wrong.

Or is multiple signed packages and one is wrong.

Or the test cars accept prod and test certs.

Or some combo of the above.

There are lots of ways this could have broken that doesn't amount to rivian not being able to write software

1 more reply

jandrese2y ago

Yeah, but how did the vehicle not just reject the wrong cert and refuse to flash the update?

mkipper2y ago

I've never worked in automotive but it's pretty easy to imagine how this might play out in a car, where a single update might bundle updates for several programmable devices.

Not that this is a great excuse. If you add OTA updates to a product that has this design, you should really be confident in your recovery solution.

DannyBee2y ago

The firmware is probably not just a signed package but signed binaries in the package as well. One is probably signed with the wrong cert.

This would not cause the updater to fail unless it verified the certs of all the binaries in the package, which most don't

mlyle2y ago

The code went through early release tests successfully; the problem came with how it was more broadly released.

They should have had further staging of the rollout (randomizing when it is offered to users).

whalesalad2y ago

AlotOfReading2y ago

Lots of ways to screw this up, especially in automotive where you're likely to be dealing with TI and their (in)secure boot.

1 more reply

mlyle2y ago

If the comment I replied to originally contained a mention of A/B partitions, I missed it.

MichaelZuo2y ago

The 'early release tests' weren't testing an identical copy of the actual update?

DannyBee2y ago

It's probably closer to:

The test vehicles accept test/prod signed versions

Regular vehicles only accept prod signed versions

They are otherwise identical.

The test vehicles were sent test signed versions

The prod vehicles were sent the exact same update, signed with test.

This would not be uncommon since the test vehicles probably occasionally run test releases for debugging.

Further, the update is probably multiple signed pieces, and the only part accidentally signed with test was likely infotainment software.

Or something like this.

It's hard to believe they wouldn't test sending badly signed updates, so i have to imagine it's a particularly weird badly signed update.

In other words, i would not assume they are idiots.

1 more reply

hef198982y ago

Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

enragedcacti2y ago

> Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

https://rivian.software/category/public/

1 more reply

jacquesm2y ago

refulgentis2y ago

Rollouts don't solve problems, they limit who they effect.

mlyle2y ago

Is not reducing the effective cost of a bad update by 10x or more worthwhile?

While if you roll out to everyone at once, you'll detect the problem sooner (within an hour) but have 10x as many affected.

xyst2y ago

When a car company is losing money on every car sale. C level execs going to cut corners

dewski2y ago

This is a bad take.

xyst2y ago

Rivian layoffs earlier this year [2] combined with reports of $33K loss per sale [1]. Rivian is hemorrhaging money right now.

RVN IPO’d at $150/share. Now it’s trading at $16/share.

All of these indicators of poor leadership to me. No sustainability. Burning cash. Poor company outlook. Poor products.

[1] https://tfltruck.com/2023/10/rivian-financial-results-losses...

[2] https://www.theverge.com/2023/2/1/23581642/rivian-layoff-ev-...

OneLeggedCat2y ago

This is an inadequate comment.

worik2y ago

Why?

cs7022y ago

So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

VyseofArcadia2y ago

Forget updates entirely. My car is one of the few places I expect to get software that works the first time.

If you absolutely must have updates, then at least not OTA updates. Have them done at the dealership or service center so any issues can be dealt with immediately.

Come on, is this engineering or hacking? This is a car, not a CRUD app. Get. It. Right.

dagmx2y ago

OTA is better for consumer when done properly. Other manufacturers manage it fine, and one bad example shouldn’t be what we base things on. It’s what we should learn from and improve on.

jacquesm2y ago

But average quality was a bit higher because nobody thought 'oh, we'll fix that next week in the OTA update'.

1 more reply

dalyons2y ago

ClumsyPilot2y ago

bad software is the one that kills people. Ugly software that works is fine

1 more reply

w0m2y ago

random new features via OTA updates was one of the deciding factors when i bought my car ... :)

I also mostly WFH so... yea. lol.

matrss2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

iso8859-12y ago

I wonder if it is somehow possible to use an open source battery management system to build a car like this. See https://foxbms.org/

jacquesm2y ago

Regulatory pressure may well get you to do stuff you wouldn't want to do.

NotYourLawyer2y ago

I’d buy that car today.

wannacboatmovie2y ago

This isn't a bunch of Windows PCs home-built from a hodgepodge of components.

If they don't, they have already failed as a car company.

AlotOfReading2y ago

I guarantee they have a database with the hardware configs. It's required by NHTSA to do recalls and notices. They'll undoubtedly be using that to inform the right people to come in.

The update servers almost certainly don't talk to that system though.

wil4212y ago

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

My Jeep Grand Cherokee has OTA for over 5+ years. BMW has been doing it since 2018.

I’m almost positive a family member had it with GMC on star back in the late 2000s.

willio582y ago

I don't think the Jeep or BMW infotainment systems are nearly as fleshed out or complex as Rivian's, especially not Tesla's. Maybe I'm wrong!

phpisthebest2y ago

Well then we need to ask why is their infotainment systems so complex? and does it need to be?

I want my infotainment systems go connect to Android Auto. That is is.

Make it do that, and only that.

This drive to make EV's as complex as possible is one of the reason i am not planning on buying one

I do not want a compplex SaaS app on wheels

2 more replies

wil4212y ago

That’s a huge plus for me. CarPlay or nothing. BMW is becoming closer to a Tesla like screen. GM is supposed to drop CarPlay in favor of whatever they are doing on their EVs.

I don’t want my ICE/EV to become a SaaS app where I’m paying $500 a year to use my own car.

1 more reply

jacquesm2y ago

duped2y ago

Updating software is orthogonal to the complexity of the software application being updated, unless you have horribly designed your architecture. I know, because I've made that mistake.

bri3d2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

Almost all automotive control modules have firmware, whether that firmware is parsing touchscreen inputs or a rotary encoder.

NotYourLawyer2y ago

Well sure, but the rotary encoder can’t get moved to a different menu tree by a software update, and I can use it without taking my eyes off the road. I know which I prefer.

xyst2y ago

treesknees2y ago

Decades? Try almost a century. For better or worse, our cities and various economies were built around the automobile.

FireBeyond2y ago

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

Tesla, whose computer systems quite regularly need to be hard rebooted while the car is driving? That Tesla?

code_runner2y ago

I had to do this once or twice (its very very infrequent in my experience) and one time it was genuinely terrifying, as I had lost blinkers etc where a few interstates all intersect and merge etc.

I still do love the car though.... but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

xienze2y ago

> I had to do this once or twice (its very very infrequent in my experience)

This is something that’s _never_ supposed to happen.

> but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

How on earth can you rationalize a Tesla performing an update/hard reset while driving as _your_ fault? It should never be allowed to happen!

2 more replies

Aurornis2y ago

> (different models, different years, slightly different components from batch to batch, etc.). It makes updating a mobile phone OS look trivial in comparison.

Not really. Vehicle computers aren’t vastly different on every model year and every trim level or option package. These parts are standardized, tested, and carried across model years.

etchalon2y ago

My Volvo XC90 gets regular OTA updates without issue, and so did my Land Rover Discovery before it.

tech_ken2y ago

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

If it ain't broke it's ripe for disruption

dylan6042y ago

if (cpu == A) do code

else if (cpu == B) do other code

xgbi2y ago

From what I read somewhere, Tesla was able to do that because they have remote ssh capability.

In at least one instance, they fixed the cars manually by running a massive remote command on all cars after a messed up update: https://lobste.rs/s/v42zil/former_tesla_employee_ssh_d_as_ma...

I wouldn’t call that very reliable , but they indeed do it regularly

FireBeyond2y ago

They'd never do that, except when they did do that.

kccqzy2y ago

The Twitter thread linked by the link posted GP actually contained a reasonable explanation of why that happened. https://x.com/atomicthumbs/status/1032939644621545473

SoftTalker2y ago

scardycat2y ago

nomel2y ago

I think this shows more of a fundamental flaw in their update mechanism, than anything.

I don't think a botched update is a big deal. It happens, and should be expected, in a sane design. The fact that the customer noticed is a big deal.

There are many implementations that could be used for an "auto rollback" feature. They either failed to implement that in a sane way, or they were goobers, and assumed things would always be rosy.

babypuncher2y ago

bink2y ago

theandrewbailey2y ago

Cars 20 years ago, even most of them 10 years ago, never got any updates unless they got recalled. Nothing broke, nothing got hacked, and most are probably still working fine.

Edit: there were electric cars over 100 years ago. I bet they never got software updates.

3 more replies

jdminhbg2y ago

There's never been any car that 100% will work in the morning when you go to the garage. It's all tradeoffs.

1 more reply

BuckRogers2y ago

It doesn’t. People and these tech companies are tools. And do it largely in search of ways to take more of your money. It’s not a favor.

1 more reply

gitfan862y ago

The Tesla update is slow probably for this reason. It is probably verifying that it can rollback at any point of failure.

liminalsunset2y ago

yjftsjthsd-h2y ago

I would naively expect it to just do A/B updates, which unless I'm forgetting something shouldn't incur a speed penalty? (Other than that the update doesn't get applied until restart)

1234letshaveatw2y ago

From a few days back- Its software has been a “key differentiator” https://electrek.co/2023/11/10/rivian-using-software-to-scal... kind of humorous in hindsight

wannacboatmovie2y ago

Rivian seems more like a "ship it and we'll fix it in the next sprint!" company.

How do other manufacturers handle updates?

post_break2y ago

Fords approach is flawed however. You can still update sync with a bad update and bork it over usb. Ask me how I know.

r00fus2y ago

Pray tell, how painful was your discovery?

post_break2y ago

Thankfully I was able to repackage it again and flash an upgrade and it unborked itself.

sturza2y ago

A/B partitions

barryrandall2y ago

reneberlin2y ago

When will humans be crazy enough to update the firmware of artificial hearts OTA?

Updating cars with new features OTA, even "just" an Infotainment can possibly cost lives, because the driver might get confused and isn't putting eyes on the streets.

It should even be possible to skip an update or stop updating at all.

rekoil2y ago

Not updating cars OTA (yes, even "just" the infotainment) can potentially cost lives as well, as security holes would not get patched until the next service appointment.

qudat2y ago

What a nightmare. This is where software engineering meets "real" engineering, where a "bug" has potentially life threatening consequences.

nomel2y ago

> where a "bug" has potentially life threatening consequences.

What are you referring to? That is not relevant to this story, and would require a deep understanding of the system to make such a claim of negligence.

“The issue impacts the infotainment system. In most cases, the rest of the vehicle systems are still operational ...”

Also, you can't do an update while driving.

jawns2y ago

nomel2y ago

I understand, but the risk of life wouldn't be from the bug, it would be from conscious choice of driving without a speedo. There's a critical distinction there.

In this case, mileage/battery are still present, and I would assume safety critical warnings would still be displayed.

1 more reply

ct02y ago

nomel2y ago

BlueTemplar2y ago

1 more reply

qudat2y ago

> What are you referring to?

nunez2y ago

critical safety systems/functions appear to be unaffected by this outage.

nicholasjarnold2y ago

Is it possible, as a licensee of the Rivian vehicle system, to disable the automatic OTA updates without having expert-level knowledge or tooling?

Also, yes, I'm specifically avoiding using the word "owner" above for obvious reasons.

558734452161112y ago

bo10242y ago

Confirming that updates are not automatic, and can be ignored indefinitely. For now.

martin84122y ago

Stuff like this is why I don't want OTA updates in my cars. Let the car dealership deal with it during regular maintenance. They'll be on the hook for fixing it before handing the car back to me.

galangalalgol2y ago

Don't even need OTA. A seattle radio station bricked a bunch of mazdas.

https://www.autoblog.com/2022/02/09/seattle-radio-station-br...

cozzyd2y ago

Amazing. Can't wait for some car software stack to be so poorly designed that an FM transmitter can remote takeover.

galangalalgol2y ago

It quite possibly could have with a well formed digital fm payload.

1 more reply

evanelias2y ago

Regardless of OTA vs dealership-only updates, software bugs can have problematic effects long after the update occurred.

This is getting very irritating. I bought an EV because I thought it would require fewer maintenance visits to the dealer!

SoftTalker2y ago

Also worth considering that a manufacturer like Rivian is pretty small. Every town has a Ford dealer. There are many states, however, that don't even have a single Rivian service center.

0xffff22y ago

ben_jones2y ago

But what if your car doesn’t have the latest emojiset or social sharing functionality within the notes app?

dilyevsky2y ago

There is nearly zero regular maintenance to be done on EVs though. No oil, no belts, no fuel filter, spark plugs etc. Even the brakes will likely last entire lifetime of the car

martin84122y ago

dilyevsky2y ago

You can’t rotate/change tires at local tire shop? It’s not under warranty anyway.

> The air cabin filter and pollen filter need frequent replacement

Yeah “regularly” like every two years lol, you gon wait for that long to update your software and pay $500 to a dealer to do it?

vmladenov2y ago

wannacboatmovie2y ago

EVs should be subject to mandatory German-style inspection by law to counteract this delusion.

dilyevsky2y ago

rconti2y ago

Just EVs?

nkingsy2y ago

Hertz seemed to find teslas cost double ice counterparts to maintain.

Maybe it’s auto company smoke but source: https://fortune.com/2023/10/27/tesla-elon-musk-hertz-evs-ren...

aftbit2y ago

>In short, the declining value of the Tesla cars in Hertz’s fleet—a decline directly caused by Musk’s price cuts—has hit Hertz squarely in its profits.

>In short, Hertz then needs to book a noncash accounting charge. Together with the higher repair costs this led to significant profit margin headwinds.

https://archive.ph/leFdf

1 more reply

dilyevsky2y ago

That’s because it is run by idiots who ran it into bankruptcy

gunapologist992y ago

This is why I don't really want my car to have any antenna (that receives/interprets code) or receive OTA updates, ever.

I'd like to please force any attackers to at least be within 50 feet of my TPMS, instead of being literally anywhere on the planet.

A car doesn't need data updates, and definitely not code updates[1]

1. source: every car built in previous century.

sbehere2y ago

> A car doesn't need data updates, and definitely not code updates

gunapologist992y ago

> Many advanced driving assistance capabilities need access to updated map tiles

Your phone (or GPS or even a paper map) can guide you; none of the following need access to map tiles:

* forward collision warning

* automatic emergency braking

* lane departure warning

* adaptive cruise control

* blind spot detection

* stability control

> code updates to fix errors or shortcomings

pard682y ago

My insistence on only driving cars made prior to 2005 keeps making more and more sense.

(2005 is just an arbitrary date I settled on, nothing significant about it)

1 more reply

eschneider2y ago

This is a bit of a nightmare scenario and why when remote updating, you always test update to your own fleet first. Always.

toddmorey2y ago

It sounds like it was tested on their own fleet but they accidentally pushed the wrong bits when deploying the update more widely out to customers.

eschneider2y ago

But who knows what these guys were doing. :/

bink2y ago

carlivarOP2y ago

Maybe they should have an additional phase between test deploy and customers such as "employee personal vehicles".

kevin_nisbet2y ago

ralmidani2y ago

Move fast and break things that move fast…

Non-rhetorical question: do companies have safeguards for critical components like braking systems, or are they also prone to catastrophic failure if a software engineer pushes a bad commit?

ezfe2y ago

The moving fast components were unaffected by this issue…

ralmidani2y ago

I know, I just thought it was a decent pun.

baz002y ago

This is why I have a Dumbcar connected to a Smartphone via bluetooth.

toddmorey2y ago

Just a counterpoint: my dumb car has been undrivable way more often than my electric car.

They never deployed bad software updates but they sure have designed & deployed bad fuel pumps.

In some ways it’s all engineering and quality control.

akira25012y ago

Which is also why there is a huge non-OEM market for those types of parts. Can you even replace the rivian "infotainment" system?

baz002y ago

I can go nearly anywhere and replace the fuel pump though.

barbazoo2y ago

That's why _I_ have a Dumbcar connected to a smartphone via FM for audio only :)

thatguy09002y ago

I've tried that before, but it sounded terrible. What dongle do you use?

bonton892y ago

barbazoo2y ago

I mean, totally, it doesn't sound great but that's all I can do, my car doesn't have bluetooth :)

1 more reply

Lightbody2y ago

This incident does NOT give me confidence that Rivian is likely to offer a better alternative to CarPlay, despite their statements otherwise.

I suspect the EX90 will be what I land on eventually.

p_j_w2y ago

>This incident does NOT give me confidence that Rivian is likely to offer a better alternative to CarPlat,

AndrewKemendo2y ago

Whomever makes the first affordable, tight tolerance electric car that doesn’t spy on you and doesn’t need special care will win the market

samsquire2y ago

It reminds me of repository-per-microservice and a Java library that other microservices use and updating a dependency and having to deploy the change to every service.

It's like trying to change wheels on a car while the car is moving or refueling a jet in flight.

Unison lang is trying to solve this problem I think, by allowing multiple versions of a function to be available.

https://www.unison-lang.org/

Migrations in databases are painful too.

One solution I've thought of which is probably overengineered is that API call sites are an abstract object and their schema and arguments is centrally deployed, I called this "protocol manager".

If you have N deployed versions of code and you want to upgrade to X, you have to test 1..N to X versions. So nobody does that.

fabianlindfors2y ago

[0] https://github.com/fabianlindfors/reshape [1] https://reshapedb.com

jbott2y ago

You might be interested in learning about Erlang – it supports hot code reloads natively: https://oozou.com/blog/understanding-elixir-otp-applications...

Someone12342y ago

- Better isolation of different parts of the system (e.g. infotainment unit, instrument cluster, et al).

- Better isolation for updates (e.g. run a "beta" update, and a "stable" update side-by-side).

- Automatic error detection and rollback (e.g. if a VM keeps restarting after an update).

- Ease of offering features like rollbacks to end-users.

- Rare hypervisor updates can be held to a much higher standard relative to other VM updates.

The only downside of hypervisor-based systems is slightly higher hardware costs. But even that is largely mitigated by modern architectures that natively support virtualization.

PS - You can also look to any containerization. I specifically brought up the XBox because it is a hardware product, just like a vehicle.

kevinventullo2y ago

My 2019 car is not connected to the internet. Instead, I use Apple CarPlay for everything.

Is there any reason not to do it this way?

aftbit2y ago

My cars are from 2019 and 2001. I don't use CarPlay or any internet features in any of them. Instead, I just use my Android's screen itself for navigation and bluetooth for phone calls and music.

lotsofpulp2y ago

kevinventullo2y ago

CarPlay is not an internet feature. It’s basically a screen mirror, and I plug my phone in via USB.

ChadMoran2y ago

Quite a few reasons but primarily CarPlay doesn't know about your vehicle's charge and road trip navigation requires external inputs to determine your charge route.

callalex2y ago

The exact API you describe already exists in CarPlay.

ChadMoran2y ago

Today I Learned... that's great to hear!

kyleee2y ago

Manufacturers should be required to expose an agreed upon spec API that provides range estimate, battery state, etc. So CarPlay and other apps can access the info in a standardized way

callalex2y ago

Apple holds up their end of the deal by supporting the EV Routing feature in maps. It’s entirely up to manufacturers to expose the information at this point.

ChadMoran2y ago

100%. I've owned Teslas since 2017 and always found the infotainment to be very good. Though I've really preferred Apple Maps routing recently. The descriptions of when to turn are much more human.

darknavi2y ago

Well for one Rivian (and Tesla) don't support CarPlay or Android Auto.

kevinventullo2y ago

Sounds more like a reason not to buy Rivian and Tesla to me

callalex2y ago

More potential for anticompetitive vendor lock-in (sugar-coated as differentiation), and more opportunities to profiteer from stalking behavior (sugar-coated as telemetry to improve user experience).

gmane2y ago

Then automakers can’t sell your information for fat stacks of cash.

pookha2y ago

..And authoritarian governments can't take complete custody of your movements. Can't shutdown your car either. Seems to be a win-win for both sides (private and public industry).

https://qz.com/1522309/how-chinas-electric-car-surveillance-...

https://apnews.com/article/4a749a4211904784826b45e812cff4ca

lotsofpulp2y ago

This, and automakers do not want to become dumb appliances (even though they are for 90% of people).

antoniuschan992y ago

Wondering why there isn’t an option for a factory reset (eg. press and hold with a paperclip for 10 seconds)

1970-01-012y ago

Lexus did the very same thing about 8 years ago:

https://www.consumerreports.org/lexus/what-to-do-if-your-lex...

sarchertech2y ago

Miku baby monitors deployed an automatic firmware update that bricked nearly every monitor in use, but not for nearly a month after the update.

It forced the company into bankruptcy because they had to replace all of them.

fsckboy2y ago

Or at least the ability to re-init/download from scratch, like a borked macbook disk. And hey, not the extra ability to do that, make it "the way it works" so you're always testing it.

wnevets2y ago

This maybe crazy but if you're writing software for hardware that cost tens of thousands of dollars it should be impossible to brick it with an update, especially if that update is OTA.

The future is going great.

thumbsup-_-2y ago

teeray2y ago

An interesting thought experiment: what happens when these vehicles are out of warranty, and automakers accidentally send a vehicle-bricking OTA update? Isn’t that property damage?

jacquesm2y ago

This has happened to some Apple hardware, they fixed it for free in some cases but stiffed others:

https://discussions.apple.com/thread/253315438

With the mandatory mobile phone updates for a few years you're definitely going to see a lot more cases like that.

A thread about Tesla directly related to your question:

https://teslamotorsclub.com/tmc/threads/wholl-be-responsible...

karaterobot2y ago

ezfe2y ago

It’s possible to deem software ready to sell but find improvements later.

This was never a safety issue and was likely not a defect. It was, however, stupid and needed to be changed.

karaterobot2y ago

ezfe2y ago

Sure, but they didn't - and it's not because the software is complicated. The Subaru headunits are very basic (and more-so a few years ago).

When they make an improvement, I like getting the trivial things on my older vehicle. It's better for everyone involved, so long as they do it responsible.

bfrog2y ago

Thousands of test points having to be verified was my understanding. That’s before even getting to the confirmed boot/watchdog aspect.

What a hassle, hope they like spending money on labor because it sounds like they are going to need to.

adolph2y ago

This is what I do with my Prius to get a comfortably distraction-free driving environment. Sounds like a feature not a bug.

altairprime2y ago

sturza2y ago

Instrument cluster display going black is a functional safety/QM issue. No blinker, transmission direction, speed etc confirmations.

bri3d2y ago

eigenvalue2y ago

nicolaslem2y ago

This is the kind of thing that keeps my awake at night.

elitepleb2y ago

A/B updates as implemented in android, https://bootlin.com/pub/conferences/2022/elce/opdenacker-imp...

hospitalJail2y ago

>Does anyone here have some practical tips to turn an embedded Linux machine into an appliance?

Lol

nunez2y ago

M3L0NM4N2y ago

I think the reason is because they're $80k trucks, not $400/month Tesla leases. Also, they're first generation and I think most of the buyers understand that.

Havoc2y ago

> the vehicle is not bricked

What a time to be alive. Software updates (almost) turning cars into paper weights lol

ct02y ago

cryptoegorophy2y ago

glonq2y ago

As somebody who has spent many years doing embedded+iot related to remote fleet firmware updates, this is the kind of thing that lurks in my nightmares.

I'd love to be a fly on the wall at Rivian engineering/operations this week!

easylion2y ago

seattle_spring2y ago

avereveard2y ago

> In most cases, the rest of the vehicle systems are still operational

Like what do you mean "in most cases" I can understand a broken infotainment needing reset but imagine if you had to tow your truck I'd be furious.

MisterTea2y ago

Can I please just buy a car with a motor and battery? Why does every god damn vehicle have to come littered with screens and chips all together like some tentacle monster?

fhub2y ago

I'm going to have a chuckle next time I pass the Databricks billboard on 101 in San Francisco "Rivian powered by Databricks" or something to that tune.

WirelessGigabit2y ago

What's the impact on your insurance should you get into an accident?

The speedometer screen is gone, so does that not imply the vehicle is inherently unsafe to drive?

Am4TIfIsER0ppos2y ago

Look at all these commenters saying "code signing was done wrong" when the wrong part is code signing at all.

j452y ago

As long as they are good for fixing it, this might what being a Pioneer or Early Adopter is about.

emmelaich2y ago

Poor title; physical repair is not required. Physical presence is required.

Someone12342y ago

So without more info we cannot know if it is accurate or not.

emmelaich2y ago

I don't think many people would consider removing a body panel to be a physical repair. I think the term is 'back to base' or similar.

Physical repair suggests e.g. a burnt out capacitor

immy2y ago

That’s funny, I just saw a job posting for Rivian Infotainment team

b200002y ago

“we use leetcode to filter out hires because it works for us”

whoopsie2y ago

Ah this is why CarPlay isn’t worth adding, right?

FireBeyond2y ago

As annoying as this, I find this laughable, too. Rivian updated users on the situation. Then, whines Electrek:

> That’s the last update we had over 10 hours after Rivian customer vehicles were fed the bad software update.

"Over 10 hours"!

shoelessone2y ago

sitzkrieg2y ago

i cant believe this sort of stuff is acceptable. what a clown industry

thrill2y ago

Inexcusable, really.

collsni2y ago

OTA on a car. What could go wrong?

janitor612y ago

rurp2y ago

This reminds of the terrible turn signals Mini used, which look like flashing arrows pointing in the opposite direction of the turn[0].

Getting cute with basic stuff like tail lights is forgettable or annoying at best, and absolutely can be dangerous.

[0]https://jalopnik.com/congratulations-mini-you-made-the-stupi...

xyst2y ago

Looks like this car brand is circling the drain. Glad I never bought into the hype.

seattle_spring2y ago

It’s circling the drain because of one bad software update?

Sounds more like you’ve just bought into the doom and gloom that a few specific news outlets have been pushing.

xyst2y ago

Tesla. Rivian. All cut from the same cloth. A car should be simple. Yet we are stuffing all of this tech junk into it and trying to repackage is as something else to pump the numbers.

bhauer2y ago

> Tesla. Rivian. All cut from the same cloth.

I'm not aware of any Tesla OTA updates bricking the infotainment system. At least since I've been paying attention. I don't see them quite as similar as you suggest.

margalabargala2y ago

There have been plenty.

https://www.reddit.com/r/TeslaLounge/comments/112oqln/new_te...

https://teslamotorsclub.com/tmc/threads/failed-software-upda...

Two examples of many.

j / k navigate · click thread line to collapse