Item2Vec: Neural Item Embedding for Collaborative Filtering (opens in new tab)

(arxiv.org)

103 pointsukz9y ago37 comments

37 comments

29 comments · 6 top-level

rm9999y ago· 8 in thread

I don't get the innovation in this paper - are they just running word2vec on groups of items? If so, Spotify has been doing this on playlists for years now: https://erikbern.com/2013/11/02/model-benchmarks/

Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.

As an aside, I've run some quantitative and qualitative tests and have found the best recommendations come from a combination of user-item and item-item. I co-gave a talk at the NYC machine learning meetup recently (https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...) that shows how this can work, starting at slide 20. The idea is to create a candidate list of matches using item-item, and then reorder using item-user. I've found this creates "sensible" suggestions using item-item, but truly personalizes when re-ordering. You can remove obvious recommendations by removing popular matches or matches the user has already interacted with (I consider this a business decision rather than something inherent in the algorithm).

meeper169y ago

Spotify got this from Berkeley Lab who were doing it in 2005 "Word2Vec is based on an approach from Lawrence Berkeley National Lab" https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234... which is interesting because the original streaming music site, seeqpod, who powered spotify, was based on vectors for songs, like a song2vec.

rahimnathwani9y ago

From the Spotify blog post: "We train a model on subsampled (5%) playlist data using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)

rm9999y ago

That's the dimensionality of the resulting word vectors in word2vec; in the item2vec paper this is the "dimension parameter m".

3pt141599y ago

Yeah, I "invented" this in 2011 or 2012 and it was one of the ideas behind the company that I sold. At the time I thought it was a clever hack, but I didn't see it as especially non-obvious.

neeraj19879y ago

hi,very informative talk; especially with those examples for handling cold start and seeding. any pointers on how the multiple entities are incorporated in the interaction matrix? I understand how user/item attributes may be incorporated in the interaction matrix but multiple entities is something that I am struggling to understand. Pointers to associated literature would help too.

rm9999y ago

This paper covers mixing different types: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf (this paper covers a related but different technique). See figure 1 for an example of mixing ratings, indicator variables, and time into a single matrix.

1 more reply

etangent9y ago

"[before computing SVD], we normalized each entry according to the square root of the product of its row and column sums."

Why didn't they use something that usually works better, like PMI?

rspeer9y ago

This is a normalization that I have used and seen other people use. I don't think it's a foregone conclusion that PMI is better for every task.

olh9y ago· 6 in thread

Does anyone know good resources/research about generating latent vector representations with iterative processes using numerical analysis algorithms and not neural networks?

The black-box effect on word2vec and similars puts back some applications like generalizing linguistics methods to bioinformatics.

RockyMcNuts9y ago

hmmh... I don't believe word2vec or item2vec would be considered neural network algorithms.

you come up with a model where a numerical vector represents the attributes of the word or item, you predict the likelihood of a match between words/items by multiplying vectors together, and then you use numerical optimization, i.e. an iterative gradient descent algorithm starting from randomly initialized vectors, to estimate the vectors that work best.

ves9y ago

They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.

3 more replies

1024core9y ago

Iterated Least Squares? https://en.wikipedia.org/wiki/Iteratively_reweighted_least_s...

Unless I misunderstood the question...

rspeer9y ago

GloVe might be what you're looking for: http://nlp.stanford.edu/projects/glove/

tokai9y ago

What about Random Indexing?

https://www.sics.se/~mange/papers/RI_intro.pdf

svictoroff9y ago

"generating latent vector representations with iterative processes using numerical analysis algorithms"

Sounds like word2vec.

praccu9y ago· 5 in thread

Fascinating.

The qualitative comparison suggests that the item2vec may produce _more_ homogenous / boring results, which is kinda unfortunate; the interesting question in recommendations is how to find "aspirational" recommendations (things the shopper would not have looked for on their own).

I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world.

On the other hand, I played with using collaborative filtering to improve the personalization of language models for speech recognition for shopping, and in that context this approach sounds like it might have been super useful, because it was actually fairly challenging to get broad enough coverage of the full set of items from a small number of purchases for the purposes of language modeling. Having good embeddings would have helped a lot.

aab09y ago

"I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world."

For another approach to product recommendation with some lift info, try https://research.googleblog.com/2016/06/wide-deep-learning-b... http://arxiv.org/abs/1606.07792

rcpt9y ago

Humans in the loop is great for this (kinda why mechanical turk was invented in the first place). I like echen's blog here http://blog.echen.me/2014/10/07/moving-beyond-ctr-better-rec...

flashman9y ago

It may be an urban myth, but somebody told me Amazon tweaked their recommendation algorithm to occasionally provide random items, the thinking being that people might be persuaded to buy something on the mere suggestion that they would like it.

aab09y ago

A multi-armed bandit will occasionally provide 'random' items as part of the exploration phase. Perhaps that's what's going on, and not any sort of diabolical self-fulfilling prophecy.

praccu9y ago

Thorsten Joachims gave a talk at Amazon Machine Learning Conference 2015, about doing specifically that. That may be what someone was talking about. I've been trying to find the paper related to the work, but am struggling to find it.

apstls9y ago· 2 in thread

I wonder if the item vectors capture semantics and behave in a way analogous to word vectors. So, for example, would a PS4 - a PS4 controller = an XBox - an XBox controller, the same way France - Paris = Greece - Athens? Something along these lines could maybe be used as a way to find relevant addons/upsells to show on the checkout page.

brg9y ago

They do. In my current research I've been working on metric embeddings to solve the question analogies of the flavor "Favorite Sushi Restaurant:Current City::???:Foreign City". It takes some work to remove the geographic signal that is overwhelmingly present in fan and checkin data.

supersonic139y ago

I attended a talk by one of the item2vec authors in ICML. He showed few examples of semantic relations, for example david guetta - beyonce = avicii - rihanna They also gave a link to a really cool 2D TSNE of item2vec on artists data. Too bad they did not include it in their paper. I guess similar types of semantic relations exist in item2vec representation for products but such relations do not appear in the paper.

karmacondon9y ago· 2 in thread

Github! This should be on github

akkartik9y ago

https://tensortalk.com/posts/ISw1FSTgJiwaymJXL/item2vec-neur...

donpark9y ago

try this one: https://github.com/cmcneil/board-yet/blob/master/model/item2...

galaxy9119y ago

This is a great model. I applied it to online retailer data and movies and it works amazingly well! much better than SVD++ or SVD. I have found it to perform very well on items with low usage too. I took the authors advice to change the window size dynamically according to the set size.

j / k navigate · click thread line to collapse

37 comments

29 comments · 6 top-level

rm9999y ago· 8 in thread

Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.

meeper169y ago

rahimnathwani9y ago

From the Spotify blog post: "We train a model on subsampled (5%) playlist data using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)

rm9999y ago

That's the dimensionality of the resulting word vectors in word2vec; in the item2vec paper this is the "dimension parameter m".

3pt141599y ago

Yeah, I "invented" this in 2011 or 2012 and it was one of the ideas behind the company that I sold. At the time I thought it was a clever hack, but I didn't see it as especially non-obvious.

neeraj19879y ago

rm9999y ago

1 more reply

etangent9y ago

"[before computing SVD], we normalized each entry according to the square root of the product of its row and column sums."

Why didn't they use something that usually works better, like PMI?

rspeer9y ago

This is a normalization that I have used and seen other people use. I don't think it's a foregone conclusion that PMI is better for every task.

olh9y ago· 6 in thread

Does anyone know good resources/research about generating latent vector representations with iterative processes using numerical analysis algorithms and not neural networks?

The black-box effect on word2vec and similars puts back some applications like generalizing linguistics methods to bioinformatics.

RockyMcNuts9y ago

hmmh... I don't believe word2vec or item2vec would be considered neural network algorithms.

ves9y ago

They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.

3 more replies

1024core9y ago

Iterated Least Squares? https://en.wikipedia.org/wiki/Iteratively_reweighted_least_s...

Unless I misunderstood the question...

rspeer9y ago

GloVe might be what you're looking for: http://nlp.stanford.edu/projects/glove/

tokai9y ago

What about Random Indexing?

https://www.sics.se/~mange/papers/RI_intro.pdf

svictoroff9y ago

"generating latent vector representations with iterative processes using numerical analysis algorithms"

Sounds like word2vec.

praccu9y ago· 5 in thread

Fascinating.

aab09y ago

For another approach to product recommendation with some lift info, try https://research.googleblog.com/2016/06/wide-deep-learning-b... http://arxiv.org/abs/1606.07792

rcpt9y ago

Humans in the loop is great for this (kinda why mechanical turk was invented in the first place). I like echen's blog here http://blog.echen.me/2014/10/07/moving-beyond-ctr-better-rec...

flashman9y ago

aab09y ago

A multi-armed bandit will occasionally provide 'random' items as part of the exploration phase. Perhaps that's what's going on, and not any sort of diabolical self-fulfilling prophecy.

praccu9y ago

apstls9y ago· 2 in thread

brg9y ago

supersonic139y ago

karmacondon9y ago· 2 in thread

Github! This should be on github

akkartik9y ago

https://tensortalk.com/posts/ISw1FSTgJiwaymJXL/item2vec-neur...

donpark9y ago

try this one: https://github.com/cmcneil/board-yet/blob/master/model/item2...

galaxy9119y ago

j / k navigate · click thread line to collapse