The idea behind adding special markings for computers is that you can waste a lot of time and effort trying to perfect algorithms interpreting signage optimized for human consumption, or... you can spend much less time and effort, to much better effect, setting up additional signage that's easy for computers to consume and provides information relevant to computers (which is not necessarily the same as what humans need).
Note that roads are controlled, artificial environments. There's always someone responsible for maintaining a road and signage on it. The infrastructure to deploy additional markings dedicated to self-driving cars already exist.
Humans rely on road lines and signs as well and when those are covered, human capability is reduced and they have to drive slower.
For snow that generally disrupts the sign fiducial, we had a few solutions. The first is that if fiducials are dense enough, then dead reckoning may be sufficient until the next fiducial is observerd. The second is to try and build different layers of data with different error correction capabilities. One system developed could relay a low number of bits from a far distance and reasonable error correction capability. The remaining bits were then much smaller and readable from up close. The thought being that if you are able to fully resolve a fiducial in one area then assuming an apiori map of fiducials, the first 16 to 24 bits of a 64 bit code is likely enough to accurately resolve location.
The vision of self driving going much beyond driver assists like we have today is going to die a slow death as more and more people realize it just isn't worth the CAPEX and OPEX. Humans brains are cheap in comparison.
They are error-detecting and error-correcting digital signage that gives you precision 6dof orientation “for free.” (And you can do even better... can stick a bunch of them all over a deformable object to map back the shape and deformation.... without lengthy registration and with only a single camera... unlike vidicon) And they’re easy to implement in software. Can be extremely fast (<10ms... no real limit as you can implement it in an FPGA) and fairly lightweight and can use basically any kind of digital camera (good or bad).
It's easier to get money for a futuristic sounding project than for routine shoring up of infrastructure so it doesn't keep falling apart. (Just like many codebases)
Google, among others, is using private (external, not owned or controlled by google) wi-fi access points as radio beacons for navigation today. Good old wardriving google. This is one factor facilitating navigation in cities when GPS is spotty. And then you have cell towers.
Centralized, standardized navigation facilitation solutions would be better, more reliable, not require mobile internet access for alignment with beacons, as they would be in a stored database in the vehicle, etc.
With machine-readable signage based on fiducials, not only can you error-correct the signage (making it highly resistant to simple alterations and ambiguous readings), you can also encrypt it and get 6 degree of freedom vehicle vs signage pose information. All from a single camera view. That makes it MUCH more resistant to abuse than machine-learning of human-readable signage... arguably more resistant to abuse even than human readable signage is to humans.
(Also, I’ve thought of additional ways that fiducials could be resistant to such measures.)
Just because some Teslas are dumb enough to drive into barriers today doesn't mean we can't continue improving the technology.