No idea how Shazam can be so far behind and still be synonymous with song identification in everyday speech.
I’ve assumed Shazam must simply be better. Never had a need to find out.
Now I know!
I seem to recall that Apple announced || demoed this a couple of WWDCs ago. I didn't realize it was already deployed. I assumed it was going to be part of Siri/HomePod.
You used to be able to Shazam the middle of a song, and it would start displaying the lyrics in real time. Apple even demoed it in a keynote. I haven’t seen that feature in a while.
This could have been summarized in a tweet
"Shazam for singing: Used whisper to transcribe what I sing, then used Google to get the name of the song"
You can imagine it like a sequence of hash codes or shingles (for modeling gaps/pauses to borrow a term from Web page similarity) for subsequent parts of the songs.
Notably, Shazam does not aim to transcribe the lyrics; so the OP's approach may potentially claim some novelty here. In any case, this experiment shows how great large pre-trained neural language models are for rapid prototyping to put something together quickly - perhaps to test feasibility before attempting to develop something better and more bespoke.
In a related note, Apple may be working on a similar service, at least they filed a related patent application: https://patents.google.com/patent/US6990453B2/en
Your next challenge: make it work with humming. (Hard mode: humming by someone with as terrible pitch as me.)
It also makes me wonder how well ChatGPT can directly tell you lyrics verbatim, and how that would be yet another legal issue
gigabrain: What python script can I write that posts an audio file I recorded to an AI I found from a Google Search, but via REST, returns the text, and then posts the texts as lyrics to ChatGPT 4 which can't tell anything after the cut off date
highly regarded