Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.
As an aside, I've run some quantitative and qualitative tests and have found the best recommendations come from a combination of user-item and item-item. I co-gave a talk at the NYC machine learning meetup recently (https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...) that shows how this can work, starting at slide 20. The idea is to create a candidate list of matches using item-item, and then reorder using item-user. I've found this creates "sensible" suggestions using item-item, but truly personalizes when re-ordering. You can remove obvious recommendations by removing popular matches or matches the user has already interacted with (I consider this a business decision rather than something inherent in the algorithm).
Any idea what those 40 factors might be?
(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)
Why didn't they use something that usually works better, like PMI?
The qualitative comparison suggests that the item2vec may produce _more_ homogenous / boring results, which is kinda unfortunate; the interesting question in recommendations is how to find "aspirational" recommendations (things the shopper would not have looked for on their own).
I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world.
On the other hand, I played with using collaborative filtering to improve the personalization of language models for speech recognition for shopping, and in that context this approach sounds like it might have been super useful, because it was actually fairly challenging to get broad enough coverage of the full set of items from a small number of purchases for the purposes of language modeling. Having good embeddings would have helped a lot.
For another approach to product recommendation with some lift info, try https://research.googleblog.com/2016/06/wide-deep-learning-b... http://arxiv.org/abs/1606.07792
The black-box effect on word2vec and similars puts back some applications like generalizing linguistics methods to bioinformatics.
you come up with a model where a numerical vector represents the attributes of the word or item, you predict the likelihood of a match between words/items by multiplying vectors together, and then you use numerical optimization, i.e. an iterative gradient descent algorithm starting from randomly initialized vectors, to estimate the vectors that work best.
Unless I misunderstood the question...
Sounds like word2vec.