Building the Next New York Times Recommendation Engine (opens in new tab)

(open.blogs.nytimes.com)

129 pointsjprob10y ago17 comments

17 comments

17 comments · 6 top-level

ThomPete10y ago· 7 in thread

The way I see it, the primary thing to solve for any recommendation engine is to optimize for serendipity. I.e. allowing you to get information you didn't know you wanted.

This means basically also finding ex. articles that are not written by NYT.

Newspapers problem is that their primarily omnibus approach to whats relevant isn't really doing the waste amount of insightful information available that exist out there.

So the whole issue IMO with all newspapers/media these days. They are building silos where none should really exist and this is one of the primary the reason why people don't consider it valuable anymore.

volaski10y ago

Maybe you had no idea but what you describe already exists in the form of native ads and recommendation widgets like outbrain. And here's what I do when I run into them: I rarely click them. When I'm on NYT, I don't want to click out to some "recommended" website that doesn't have high journalism integrity as NYT (Let's not get into a needless argument of whether that itself is correct or not). My point is, I disagree with your argument that "different source" has anything to do with serendipity. Also you say silo shouldn't exist but I don't see a reason why. Sure there are clearly cases where certain companies siloing up their user's data is bad for humanity, but in this case it doesn't even make sense (what even is "silo" in this context anyway?). People create silos because of demand. Imagine if NYT started opening up and let any random guy on the web write articles on their front page, what did they gain by "opening up" their silo? Most readers of NYT are there exactly because it's a silo that guarantees certain degree of quality. Once they start "opening up", you'll probably be the first to say "yeah New york times is done now, it's all low quality now"

ThomPete10y ago

I don't think we are talking about the same thing.

To get sense of where I am coming from i would like to refer to some of my writing on the subject.

http://000fff.org/#/slaves-of-the-feed-this-is-not-the-realt...

and

http://000fff.org/#/how-to-think-like-facebook-and-twitter

It's about something slightly different than what you seem to imply, sorry if that was imprecise.

sumitviii10y ago

ThomPete is talking about categories as silos probably. Just like most people come to HN for mostly startups and programming, but stay for nautil.us too. And serenpidity here is refreshing.

His idea makes sense.

danso10y ago

I agree with the serendipity part. When I read a publication of general interest, like the NYT or New Yorker, I don't want to keep reading their coverage on a specific subject...for example...the NYT is definitely capable of writing a great football feature story...that doesn't mean that I want a recommendation list full of NYT sports news stories; it just so happened that that particular football story touched on a lot of universal themes, etc. This is similar to the annoying predictions from Facebook's news feed algorithm: I'll like someone's baby photo to express that, hey, congrats on having a baby! But that doesn't mean that I want to see a flood of baby photos from that same friend as they go about documenting every minute of their newborn's life.

To your second point: how do you propose NYT do recommendation on external articles? That would require a database of external articles, for one thing...I'm not sure the results would be any better than what third-party link recommenders generate (e.g. Disqus and Taboola).

ThomPete10y ago

well I am not saying it would make them any more likely to survive, but they could just create access to external links just like Pulse and other newsreaders to it to all of them or like Google News kind of did.

In some ways that is what Facebook is doing right now and why they are gaining more and more ground on the news-front. It's also why Twitter is kind of struggling because it's only external things leaving twitter as a protocol rather than a news service.

The whole trick IMO is to find a way to construct a whole story so i might read some stuff from NYT but then get access to more in depth on some sub subjects other places.

But this all kind of assumes that one is buying the relevance of newspapers moving forward which I am not, but thats just me.

eshamukhamedov10y ago

These silos are the foundation of their business. No news organization is going to have the resources to cover every piece of information on the internet, so they must specialize. The alternative to this model is to have one organization control news syndication for a large number of independent contributors. This is the intent of Facebook Newsfeed, but there's little incentive for news outlets to publish to it. No matter how much money Facebook gives them, they can make more by building up their own "silo."

ThomPete10y ago

Sure I am well aware of why they need to do it. But that need is also what is hindering them from providing actual value and why things like Facebook, Pulse, Twitter etc are popular news sources and newspapers aren't

doppenhe10y ago· 2 in thread

We have built this and anybody can use it https://Algorithmia.com/recommends. 2 lines of js to implement.Currently serving the geekwire.com recs. You can also modify it further (see blog.Algorithmia.com).

The article is awesome though good on NYT.

bydamn98910y ago

Nice product you have there. Not only did it take 10 minutes to run, it also managed to return no results.

doppenhe10y ago

That should not be the case, sorry you had a not great experience. Please share the url you used at Diego at Algorithmia for com so we can debug and get back to you.

ersii10y ago· 2 in thread

I think it'd be great if you'd have this kind of information in your help section later on, for anxious people like me who are very wary of even having a recommendation engine at a news paper. I was actually on my way to sign up for a subscription after reading "A Renegade Trawler, Hunted for 10,000 Miles by Vigilantes" by Ian Urbina - but held back for the moment to give it more thought.

That said, I guess I could see a point in it maybe retaining users / subscribers if it's good enough. (I'd still appreciated it a lot more if this functionality could be turned off for users who request it though).

buckbova10y ago

My first reaction to your comment is that you're overreacting and targeted news stories based in topics you enjoy is a great thing.

But after some thought on the recommendation engine, this seems more like a confirmation bias engine. Not something I'd want from a "news" source.

untog10y ago

I'm not really how it would be confirmation bias. The NYT doesn't have multiple stories on the same topic with differing conclusions that it A/B tests with.

As the article states, it'll suggest articles about Hillary Clinton if you've read articles about her previously, but it doesn't say it'll only give you positive ones. There is a chance that it'll narrow people's interests (if you only read sports, for example) but that already happens anyway.

flashman10y ago

I'm really impressed that NYT took the time to document this. It's always interesting to see the different recommendation models evaluated and applied to real-world situations.

I've been pursuing a collaborative filtering approach to product recommendation lately ('people who bought this also bought that'), but perhaps LDA would let me model our products based on their metadata ('people who bought products broadly like this also bought products broadly like that').

muktabh10y ago

We make a contextual recommendation engine as a service for online publishers at our startup ParallelDots. We discovered the problem of tags not really working well for recommendations on our clients websites too. We ended up using unsupervised word embeddings and auto encoders on top of them to solve the problem. We dont still use it for personalization though, just contextually similar articles. Great seeing some of similar problems being solved at New York Times too. :)

bcaine10y ago

Fun read. Topic modeling can be fascinating to work with.

Curious how they measured performance of their model, and whether they found a "best" number of topics for LDA where their model stopped getting much benefit by having more topics.

I'd imagine increased number of topics would have some interesting side effects where it would create too narrow of recommendations.

j / k navigate · click thread line to collapse

17 comments

17 comments · 6 top-level

ThomPete10y ago· 7 in thread

The way I see it, the primary thing to solve for any recommendation engine is to optimize for serendipity. I.e. allowing you to get information you didn't know you wanted.

This means basically also finding ex. articles that are not written by NYT.

Newspapers problem is that their primarily omnibus approach to whats relevant isn't really doing the waste amount of insightful information available that exist out there.

volaski10y ago

ThomPete10y ago

I don't think we are talking about the same thing.

To get sense of where I am coming from i would like to refer to some of my writing on the subject.

http://000fff.org/#/slaves-of-the-feed-this-is-not-the-realt...

and

http://000fff.org/#/how-to-think-like-facebook-and-twitter

It's about something slightly different than what you seem to imply, sorry if that was imprecise.

sumitviii10y ago

ThomPete is talking about categories as silos probably. Just like most people come to HN for mostly startups and programming, but stay for nautil.us too. And serenpidity here is refreshing.

His idea makes sense.

danso10y ago

ThomPete10y ago

The whole trick IMO is to find a way to construct a whole story so i might read some stuff from NYT but then get access to more in depth on some sub subjects other places.

But this all kind of assumes that one is buying the relevance of newspapers moving forward which I am not, but thats just me.

eshamukhamedov10y ago

ThomPete10y ago

doppenhe10y ago· 2 in thread

The article is awesome though good on NYT.

bydamn98910y ago

Nice product you have there. Not only did it take 10 minutes to run, it also managed to return no results.

doppenhe10y ago

That should not be the case, sorry you had a not great experience. Please share the url you used at Diego at Algorithmia for com so we can debug and get back to you.

ersii10y ago· 2 in thread

buckbova10y ago

My first reaction to your comment is that you're overreacting and targeted news stories based in topics you enjoy is a great thing.

But after some thought on the recommendation engine, this seems more like a confirmation bias engine. Not something I'd want from a "news" source.

untog10y ago

I'm not really how it would be confirmation bias. The NYT doesn't have multiple stories on the same topic with differing conclusions that it A/B tests with.

flashman10y ago

I'm really impressed that NYT took the time to document this. It's always interesting to see the different recommendation models evaluated and applied to real-world situations.

muktabh10y ago

bcaine10y ago

Fun read. Topic modeling can be fascinating to work with.

Curious how they measured performance of their model, and whether they found a "best" number of topics for LDA where their model stopped getting much benefit by having more topics.

I'd imagine increased number of topics would have some interesting side effects where it would create too narrow of recommendations.

j / k navigate · click thread line to collapse