I don't think that the seatbelt example is overfitting. It's a consequence of their not being the "correct" available label. In many ways I guess the seatbelt would be the best option.
Sure, I agree the seatbelt example comes from not having enough labeled classes. Feels a little bit more like the realm of semi-supervised learning, but I'd love to learn about interesting approaches in that case too :)