Our matching algorithm is based on the open source echoprint-codegen fingerprinting method, which we have built our own stack around:
- Replaced Solr/Tokyo Tyrant with Elasticsearch
- Reimplemented matching-logic
- Crawlers search multiple sources for audio files to be indexed (mp3s arent stored long term, only fingerprinted then deleted)
- Indexing about 1 new track per second
- Found method to verify unrealiable ID3 tags (in progress, current database also includes unferified)
- mogilefs as primary data store for fingerprints
- perl everything
We also provide a free music identification API.
Any feedback would be much appreciated!
Thanks for clearing that up, good luck with your site!
Do you plan to document/opensource you work ?
Also we pack each individual hash before storing in Elasticsearch and gained at least 50% storage space this way.
Our Fingerprint data is quite different from theirs(unreliable ID3 tags, N versions of same track) which is why we needed some tweaks. So far the matching is still far from perfect...
Whether we will open source the whole thing at some point we don't know yet.
Could you describe the usecases ? Is it for mixtapes uploaded on youtube by DJs, or Over-The-Air recognition in music festival videos ?
Because Music ( single tracks ) uploaded on youtube is usually already identified so it could be found.
Also the guys from Trax-air.com are doing something pretty similar to you guys but with pitch/bpm bending support.
Would be interested to know if you figured a way around that!
Watching youtube video, movies or just having someone else play something and it usually finds it without problems.
It's a different use case than OPs app though, which is more on demand I guess.
http://www.cnet.com/how-to/siri-can-now-name-that-tune-via-i...
It works fairly well.
http://www.greenbot.com/article/2873722/how-to-perform-song-...
Again, works fairly well.
On a slight tangent: I'd love a client that could identify my MP3 collection, and rename it and retag it (under some kind of supervision). Ideally it'd do the dentification in a batchmode when it got Internet connectivity (but this is perhaps an unreasonable requirement). And to make it perfect it would let me listen to and delete tracks.
I have a huge unweildy collection of MP3s and I can't bring myself to just delete gigabytes of music.
I used it to tag a massive amount of partially labelled and mostly metadata-free music files some time ago and it worked a treat.
If you're indexing all the random and free stuff out there, you're picking up a lot of material that may have never been commercially released or has not been re-released digitally. At the same time, Shazam, YouTube ContentID, Apple's iTunes Match, etc. have access to an extremely large set of references which (more than you could have) contain 99% accurate metadata. ContentID definitely picks up multiple songs in mixes, as well as pitch changes, with a high degree of accuracy (assuming the master sound recording has been submitted to YouTube).
A submission system would be great too, or some way for persons to tag stuff themselves ala discogs, etc.
Gets flagged as copyrighted music even though there is no music?
So many YouTuber's would thank you for a service that did this.