I don't believe your hand coded features will outperform ML trained over a billion miles, even in the long tail rare events.
Sure ML is bad at rare events, but hand coded features also map poorly to such rare events, because it's unlikely you'll be able to imagine them all, or even manage to handle categories of rare events well.
Furthermore, I believe it would be possible to train an ML based rare-event detector, which simply detects the difference between "regular driving I'm familiar with" and "something funny is going on, I haven't seen this much in the training dataset", and then refer to a remote human to resolve the issue.