undefined | Better HN

0 pointsamelius8y ago0 comments

(removed)

0 comments

This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.

Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

ameliusOP8y ago

> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.

tsomctl8y ago

Lol. When you superimpose noise, the original data is still there. When you have a FM radio playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store and being captured by a terrible microphone and then being compressed, a significant amount of nonlinear distortion has taken place. That is extremely hard to model. And you would have to model it or have real data to train a neural network. Neural networks are extremely hard to train without excellent data.

3 more replies

throwawayfinal8y ago

The easy part? If you're apple, google, or microsoft, maybe.

1 more reply

dzhiurgis8y ago

> use to improve model performance.

Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.

dawnerd8y ago

I'd be more curious to see you try to build this in an afternoon.

Also, it works a lot better than being able to find "slightly distorted" versions. It can catch a song in a noisy room where you can barely make out the song to begin with. Couple months back it found a song when there was a very loud crowd yelling over it. They're also able to determine differences between versions of songs pretty well. Some remixes might sound very close to the original.

Other thing you might be missing is just how fast it is even on a slow mobile connection.

ameliusOP8y ago

(removed explanation because it seems not appreciated)

vedant8y ago

This is a heap of nonsense. Not only does this summarily dismiss the enormous challenges in digital signal processing required for removing arbitrary background audio, it exposes some confusion associated with the ideas of correlated random variables, inner products, and affine transformations.

1 more reply

xapata8y ago

> (removed explanation because it seems not appreciated)

I think describing the reaction as lack-of-appreciation is a bit misleading. Perhaps disbelief might be a better description.

1 more reply

mmmmmbop8y ago

So they're doing this vector correlation for every song in their database (40 million)?

1 more reply

ska8y ago

A tiny percentage of the time occasionally when a successful company makes a N-hundred million purchase of a technology and company and you don't understand why, it's because they have made a mistake.

The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.

In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?

"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.

paulcole8y ago

> isn't it possible to build this yourself in Python in an afternoon?

Is anyone keeping a running list of products that HN commenters have suggested could be built in an afternoon/weekend?

Ones I've seen so far: Facebook, Twitter, Dropbox, and now Shazaam.

fuck_google8y ago

Building the service is not usually the hard part, but building the ecosystem around it is. There were/are countless services similar to Facebook or Twitter, but only a few of them can be really successful because of herd mentality.

cocktailpeanuts8y ago

Go build it yourself in an afternoon and come back.

Then we'll talk.

Zenst8y ago

Some insight into the Shazam tech and article has a link to the algorithm being used.

https://www.toptal.com/algorithms/shazam-it-music-processing...

ameliusOP8y ago

Thanks. This actually proves my point that the core concepts of Shazam can be implemented in a weekend. Of course, programming the front-end etc. is more work, but that is besides the point.

ghaff8y ago

You'd need access to the fingerprints database but otherwise should be fairly straightforward.

What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

ameliusOP8y ago

> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.

rhizome8y ago

Converting digital audio to notes is both not as easy as it sounds and not how Shazam works.

https://www.toptal.com/algorithms/shazam-it-music-processing...

1 more reply

cortesoft8y ago

Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?

azinman28y ago

Uhh isn’t creating the “fingerprint” the non-straightforward part? Keep in mind you could start listening at any point in the song as well.

chiefalchemist8y ago

Yeah you can build the platform but you won't have any data; no history of searches. You also won't have any users.

I'm not disputing they overpaid. However, long to short, building the technology is the easy part, and just a fraction of the brand / product value.

lavezzi8y ago

y'all realise they run a music service too right? Having access to cross-platform data that gives them insight into bleeding edge emerging/trending artists and songs is priceless.

ameliusOP8y ago

Apple already had a music service too (itunes).

oarsinsync8y ago

Different signals.

"What's that song?" is a different signal to buying a song. Especially when "what's that song?" isn't restricted by licensing agreements.

coppolaemilio8y ago

It is already working. It is not that simple to start from scratch and get it working properly + the DB and everything

donarb8y ago

It's not only the song recognition technology, Shazam also has video recognition and AR technology.

j / k navigate · click thread line to collapse

0 comments

vedant8y ago

This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.

ameliusOP8y ago

> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.

tsomctl8y ago

3 more replies

throwawayfinal8y ago

The easy part? If you're apple, google, or microsoft, maybe.

1 more reply

dzhiurgis8y ago

> use to improve model performance.

Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.

dawnerd8y ago

I'd be more curious to see you try to build this in an afternoon.

Other thing you might be missing is just how fast it is even on a slow mobile connection.

ameliusOP8y ago

(removed explanation because it seems not appreciated)

vedant8y ago

1 more reply

xapata8y ago

> (removed explanation because it seems not appreciated)

I think describing the reaction as lack-of-appreciation is a bit misleading. Perhaps disbelief might be a better description.

1 more reply

mmmmmbop8y ago

So they're doing this vector correlation for every song in their database (40 million)?

1 more reply

ska8y ago

The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.

In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?

"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.

paulcole8y ago

> isn't it possible to build this yourself in Python in an afternoon?

Is anyone keeping a running list of products that HN commenters have suggested could be built in an afternoon/weekend?

Ones I've seen so far: Facebook, Twitter, Dropbox, and now Shazaam.

fuck_google8y ago

cocktailpeanuts8y ago

Go build it yourself in an afternoon and come back.

Then we'll talk.

Zenst8y ago

Some insight into the Shazam tech and article has a link to the algorithm being used.

https://www.toptal.com/algorithms/shazam-it-music-processing...

ameliusOP8y ago

Thanks. This actually proves my point that the core concepts of Shazam can be implemented in a weekend. Of course, programming the front-end etc. is more work, but that is besides the point.

ghaff8y ago

You'd need access to the fingerprints database but otherwise should be fairly straightforward.

What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

ameliusOP8y ago

> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.

rhizome8y ago

Converting digital audio to notes is both not as easy as it sounds and not how Shazam works.

https://www.toptal.com/algorithms/shazam-it-music-processing...

1 more reply

cortesoft8y ago

Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?

azinman28y ago

Uhh isn’t creating the “fingerprint” the non-straightforward part? Keep in mind you could start listening at any point in the song as well.

chiefalchemist8y ago

Yeah you can build the platform but you won't have any data; no history of searches. You also won't have any users.

I'm not disputing they overpaid. However, long to short, building the technology is the easy part, and just a fraction of the brand / product value.

lavezzi8y ago

y'all realise they run a music service too right? Having access to cross-platform data that gives them insight into bleeding edge emerging/trending artists and songs is priceless.

ameliusOP8y ago

Apple already had a music service too (itunes).

oarsinsync8y ago

Different signals.

"What's that song?" is a different signal to buying a song. Especially when "what's that song?" isn't restricted by licensing agreements.

coppolaemilio8y ago

It is already working. It is not that simple to start from scratch and get it working properly + the DB and everything

donarb8y ago

It's not only the song recognition technology, Shazam also has video recognition and AR technology.

j / k navigate · click thread line to collapse