I think the embedding idea in specific is quite good and there are several potential approaches! One is, if you let xᵢ be the song vector for user i (whose kth entry is 1 if user i has song k in their library and 0 otherwise). Then you can compute the "overlap" by something like xᵢ ⋅ xⱼ (with some normalization of course!) where ⋅ is the inner product. [0]
A simple approximation of this inner product would be to generate a random (potentially sparse!) matrix S whose nonzero entries are i.i.d. Gaussian, for example, and whose number of rows is much smaller than the number of columns [1], then you can instead store and compute
(Sxᵢ) ⋅ (Sxⱼ)
which gives you an approximate overlap, whose storage and computation requirements are much smaller for each user (since Sxᵢ is much smaller in number of entries than xᵢ).
-----
[0] Of course, there are many other similar methods! This is a particularly simple, but often fairly effective one.
[1] More specifically, it goes like O(log(n)/ε²) where ε is the error you wish to achieve. Often, a fairly large choice of ε actually will suffice. See https://en.wikipedia.org/wiki/Johnson–Lindenstrauss_lemma