Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.
That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.
Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.
Also, it works a lot better than being able to find "slightly distorted" versions. It can catch a song in a noisy room where you can barely make out the song to begin with. Couple months back it found a song when there was a very loud crowd yelling over it. They're also able to determine differences between versions of songs pretty well. Some remixes might sound very close to the original.
Other thing you might be missing is just how fast it is even on a slow mobile connection.
I think describing the reaction as lack-of-appreciation is a bit misleading. Perhaps disbelief might be a better description.
The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.
In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?
"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.
Is anyone keeping a running list of products that HN commenters have suggested could be built in an afternoon/weekend?
Ones I've seen so far: Facebook, Twitter, Dropbox, and now Shazaam.
Then we'll talk.
https://www.toptal.com/algorithms/shazam-it-music-processing...
What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.
Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.
https://www.toptal.com/algorithms/shazam-it-music-processing...
I'm not disputing they overpaid. However, long to short, building the technology is the easy part, and just a fraction of the brand / product value.
"What's that song?" is a different signal to buying a song. Especially when "what's that song?" isn't restricted by licensing agreements.