1. This tells you the SP is broken; just using individual keys means that doesn't matter anymore if the SP is broken or not. Individual keys is in your control, fixing the SP much less likely so. And you can just set up a practice of doing it for everything and now it's one less thing to test for.
2. That still requires a bit of testing that's somewhat annoying to set up, which most vendorsec practices don't have time for. It's also only one of dozens of things you need to test for. Ignoring audiences is super common, but a more subtle problem is that you can sign a valid SAML assertion _for the wrong domain_, and now you can sign in as a competitor's staff.
As you hint at, having an SP that'll just self-service accept any random metadata.xml at least gives you a fighting chance :)
The challenging part is doing it for vendorsec, when you are vetting _other apps_. The timelines that stakeholders (other people in the company who want to use the app) are willing to accept are like, a week, and even if you somehow had a SAML testing praxis at the ready that enumerates all of the problems SAML has historically had, there's a lot more to test than just the SAML bits.
So: in summary: I don't think that number is anywhere near zero, though sure, it's not huge. The hard part is failures being silent and being in parties you don't control.