undefined | Better HN

0 pointsgarrepi5y ago0 comments

I agree 1,000%.

When building out the platform, efficiency was a foremost consideration. Because of that, each comparison calculation flies.

That's a good implementation suggestion, it reminds me of the adjacency matrix Twitter uses for mutual friends.

I'm drafting up proposal for your idea now, it serves a really strong introduction to the platform.

Thanks!

0 comments

3 comments · 1 top-level

LolWolf5y ago· 2 in thread

I think the embedding idea in specific is quite good and there are several potential approaches! One is, if you let xᵢ be the song vector for user i (whose kth entry is 1 if user i has song k in their library and 0 otherwise). Then you can compute the "overlap" by something like xᵢ ⋅ xⱼ (with some normalization of course!) where ⋅ is the inner product. [0]

A simple approximation of this inner product would be to generate a random (potentially sparse!) matrix S whose nonzero entries are i.i.d. Gaussian, for example, and whose number of rows is much smaller than the number of columns [1], then you can instead store and compute

    (Sxᵢ) ⋅ (Sxⱼ)

which gives you an approximate overlap, whose storage and computation requirements are much smaller for each user (since Sxᵢ is much smaller in number of entries than xᵢ).

-----

[0] Of course, there are many other similar methods! This is a particularly simple, but often fairly effective one.

[1] More specifically, it goes like O(log(n)/ε²) where ε is the error you wish to achieve. Often, a fairly large choice of ε actually will suffice. See https://en.wikipedia.org/wiki/Johnson–Lindenstrauss_lemma

garrepiOP5y ago

Brilliant.

Through the Spotify API we are given locational and positional data on some music objects. Your proposal to structure songs as vectors opens up a lot more flexibility for storing, weighting, and manipulating that data.

The way in which the overlaps are calculated is crucial to our platform, something like feeding the dot product into a weighting matrix would leverage this structure really well.

Your proposal is very similar to how comparisons are currently computed on SameTunes. But, leveraging existing mathematical theorems and restructuring the algorithm with linear algebra in mind would help a ton with some of the noise and normalisation issues we've experienced.

I really appreciate you taking the time to draft that out!

LolWolf5y ago

For sure! It’s a cool product (I got two people who messaged me about it this morning, before I even saw it on hn) and it’s pretty fun. Good luck with it! :)

j / k navigate · click thread line to collapse