story
An autonomous vehicle has an uncountable number of operating modes, and it is not feasible (perhaps not even possible) to test it in all possible conditions and states. Even if you could, doing so for (say) every single software change would take years each time.
Maybe that does mean that this is a fool's errand, and we just shouldn't be building autonomous cars, at least not until we have AGI that can think and act faster and with better judgment than a human.
I personally do think that "better record than a human driver" should be sufficient (perhaps with some significant, TBD margin; 0.1% better is probably not enough), but I accept and agree with your toplevel comment that sort of thing won't fly in the real world. The bar is really more like the self-driving car has to avoid making the specific kinds of mistakes and illegal/unsafe acts that a human driver would do (all while not creating new classes of mistakes that a human driver would not make), and, on top of that, be better than a human driver in situations where crash would not be deemed that human driver's fault.