With audio fingerprinting the content provider must provide a way to fingerprint its own audio and have access to fingerprints of the internet's audio/video. This means a partnership between e.g. youtube and a studio. I'm fairly sure this involves studios above a certain size, resources for programming+API and a fair bit of paperwork and testing for robustness as there are ways to mess with the technique.
With this technique you just enter a few words and look at what comes out.
You're suggesting that the first option is easier?