undefined | Better HN

0 pointsmo011y ago0 comments

First we rewrote echonests truescore logic in perl and then altered slightly and implemented some extra checks to further try to exclude false positives. We also believe what they used in the late song/identify API might have been different from what is open sourced in https://github.com/echonest/echoprint-server

Also we pack each individual hash before storing in Elasticsearch and gained at least 50% storage space this way.

Our Fingerprint data is quite different from theirs(unreliable ID3 tags, N versions of same track) which is why we needed some tweaks. So far the matching is still far from perfect...

Whether we will open source the whole thing at some point we don't know yet.

0 comments

3 comments · 2 top-level

caractacus11y ago· 1 in thread

When you say the matching is far from perfect, is that at your end or on the part of the echoprint / echonest code? You made tweaks because you found issues with what they were doing....?

tk4211y ago

The reason for it being far from perfect is likely a combination of both. If the correct song is indexed there is a high probabiliy for us to find the right match. However if its not, with a bit of bad luck a false positive can happen easily with the default solution (and ours too). Also when analysing a youtube video it can happen that in a 30sec snippet only 10 secs are a matching song and 20 are unrealated or 15 are one matching song the other 15 match a different one in which case 2 tracks or multiple versions of 2 different tracks will have relatively OK scores. Deciding what to consider a match (or whether to try different queries for the same or slightly altered timespan prior to deciding) is not trivial in these cases and our changes are mostly concerning when a match will be considered a match by altering thresholds and how matching truescores will be looked at in relation to other fingerprints true scores. Due to issues like these, specifying a timeframe for analysis will often produce better results.

http://static.echonest.com/echoprint_ismir.pdf

brianwhitman11y ago

song/identify supported both ENMFP and Echoprint, and AFAIK the Echoprint matching path was exactly the same as is published on Github.

I know at some point we did adapt the Solr end (for example, we removed the N most occurring codes) for speed optimizations.

Many users of Echoprint in the wild have adapted the python matching logic for their use case as well as changed the hash update rate on the codegen. A great modification to watch was Sunlight Labs' "Ad Hawk", which ID'd commercials: https://github.com/sunlightlabs/adhawk

j / k navigate · click thread line to collapse