Yup. I work in robotics.
I try to isolate the actual hardware interaction layer so that for testing you can mock the driver and hardware in one piece. Of course that does not test the driver. With any luck, the driver is pretty stable once it works, though. And the driver+hardware piece can have it's own (physical) test bench so that at least manual testing is, well maybe not efficient, but at least not painful.
Simulators are great but not always available. Or are too much work to get going.
One configuration often used for robots is the "boneless chicken". Take a bench, and bolt all the guts down to it in a configuration where they are easy to probe. Put the wheel motors someplace safe, with a synthetic load like a pony brake. Of course you can't test the nav stack that way. (I once interviewed a firmware engineer who was coming off of the Juicero shutdown -- say what you want about Juicero, but from the sounds of it their boneless chicken was outstanding, even integrated into the CI automation pipeline. Of course, they didn't have the nav problem).
Speaking of nav, I once saw a warehouse robot company's nav PR test micro-warehouse. Not the full test warehouse, just a 500 square foot or so area dedicated to testing nav PR's. It was integrated with CI automation. I could tell from the accumulated tire marks on the floor that they had nav pretty much nailed.
I have done several robot-to-elevator interfaces (probably more than anyone else). In the end, final testing always required something akin to a few midnight to 4 AM test blocks on the real elevator. And then of course as you point out:
> the whole system has a ton of potential interactions that are hard to write test cases for.
They often don't show up until the system is under load.