Using reinforcement learning and $4.80 of GPU time to find the best HN post (opens in new tab)

(openpipe.ai)

217 pointskcorbitt1y ago95 comments

95 comments

72 comments · 25 top-level

jerjerjer1y ago· 11 in thread

> In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.

> Even if the model gets extremely good at predicting final_score_if_it_hits_front_page, there’s still the inherent randomness of probability_of_hitting_front_page that is fundamentally unpredictable.

In addition to date, you might want to include three fields:

- day of week (categorical)

- is weekend/holiday (boolean)

- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).

The probability of a post hitting the front page is usually affected by these things so it can really help the model.

sitkack1y ago

I find that the best stories get posted by folks in EU time zones as well as the weekend (more of hacker ethos). The flame bait startup drama is M-F Pacific.

jedberg1y ago

I haven't run the data, but anecdotally I can tell you that those things probably don't affect hitting the front page. They do affect the total score, but that is not what is being optimized here.

It's counterintuitive, but if you post at a really popular time, you're competing with a lot of other submissions. If you post at a really slow time, you'll get fewer votes, but it will take fewer to reach the front page and you'll have less competition.

In the end, it kinda evens out. The number of votes it takes to get to the front page and the number of competing submissions are both correlated to your fields above.

floobertoober1y ago

I think that this assumes a uniform distribution of "interestingness" in the competing posts across all of those dimensions and I wouldn't be surprised if that isn't the case

1 more reply

4m1rk1y ago

Popular time for voting vs posting are not the same

josefx1y ago

> is weekend/holiday

Somehow this reminded me of someone datamining spiegel.de (german news site) and using the timestamps of the posted articles to extrapolate the writers religion (holidays) and relationships (shared vacations) among dozens of other data points from several years of publicly available data. I think no AI was involved back then.

EffrafaxOfWug1y ago

For anyone interested, it was this CCC talk by David Kriesel (sadly german only).

https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...

1 more reply

maaaaattttt1y ago

I wonder if hour of day would benefit from being combined with HN's visitors location data to be truly relevant? I think the location is embedded in the time somehow if the visitors' origins are stable over time. If 9am PT is a popular time and most of the visitors are on the PT timezone then even if this 9am PT is encoded as UTC the model will pick it up (I think). Now, if over time visitors get more diverse and a big chunk is now coming from Europe, this original 9am will make less sense to the model. Adding visitors origin stats at time of the post would probably even help surface region trends. But I guess this historical data isn't public.

kcorbittOP1y ago

Yep that makes sense. Would be interesting to do a follow-up that explicitly includes these variables and see if it meaningfully improves the results.

fennecbutt1y ago

The data is massively interconnected too: if Apple releases new m chip, people flood here to see if there's a thread on it, while browsing they may be more or less likely to see other threads given that first case.

rajnathani1y ago

I would replace author with a boolean of if the author's account is new or not (the green marker that HN has for new users' posts and comments).

aaron6951y ago

> might want to include three fields:

This has been studied multiple times on HN posts, most seem to have link-rotted. Web Archive them if looking for insights - https://hn.algolia.com/?q=best+time+to+post

sdflhasjd1y ago· 9 in thread

It's interesting that service complaints are so popular on HN. I always feel a bit bad that my most popular HN contribution was me complaining about a popular service

kelnos1y ago

I flag most complaint posts, unless the complaint actually brings to light or discusses something surprising or unique that can be generalized and discussed.

I generally find these posts pretty boring, and most comments on them are people recounting their own stories about how that (or a similar) service screwed them over. I suppose they can be a decent way to warn people off of a particular product (scammy, terrible customer support, whatever), but that's not what I come to HN for.

Karrot_Kream1y ago

A popular theory on techie parts of the web is that engagement-optimizing sites create this negativity loop, but I disagree. I think negativity is naturally something that people seek no matter what the algorithm is. In an upvote based site, outrage ranks to the top. I also think text based platforms suffer from negative engagement much moreso than multimedia platforms.

Model correlation is decent here but there's certainly more to do to use its outputs predictively.

johnfn1y ago

I don't really agree with this. I go and hang out with my friends, and we don't all end up getting outraged about stuff. I go for a walk in the park and no one is shouting at me; I go to a restaurant and people are sitting around normally discussing whatever. If you start quoting outrage bait that you read online, people might look at you strangely.

My point is I don't think people seek out outrage. Social media's algorithms may not explicitly reward it as transparently as `if (post.outrage > 100) post.boost()`, but outrage isn't some default rule of interaction.

miki1232111y ago

As a mastodon user, I can definitely confirm this.

Give people the way to repost / retweet / boost, and your feed suddenly turns into mostly negativity, even if your algorithm is "show posts from my followers only, newest to oldest"

1 more reply

Vampiero1y ago

If that theory were true then, what about every website on the internet pre-2010? What about 4chan?

We're just built like that.

Regarding text platforms suffering more than non-text platforms, I think it's because of the lack of social cues that are otherwise there. You can infer a lot from the way someone talks, or from their body language. You can't infer much from text, which is partly why Poe's law exists -- sarcasm doesn't translate well.

1 more reply

int_19h1y ago

This video will make you angry: https://www.youtube.com/watch?v=rE3j_RHkqJc

jerjerjer1y ago

Humans love having something to be righteously indignant about.

Rick761y ago

I don't like it, but it seems the internet always reacts more to inherently negative posts. That seems to be common across the entire internet, I think that's why the internet doesn't seem as fun as it did 10 years ago.

I'm sure it's just human psyche but I'm trying to overcome it and make my life more positive again

andrewmcwatters1y ago

I suspect a large percentage of Dan's work moderating HN is downweighing posts that incite engagement from frustration. I've had on at least one occasion the top comment in a thread by over 100 upvotes that was purely the sentiment of several readers but did not contribute to the curated voice of the community.

oli56791y ago· 5 in thread

If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.

https://scikit-learn.org/dev/modules/generated/sklearn.isoto...

I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

kcorbittOP1y ago

I hadn't heard of isotonicregression before but I like it!

> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.

If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.

Y_Y1y ago

Did you dictate this? It looks like you typo'd/brain I'd "centered" into "censored", but even allowing for phonetic mistakes (of which I make many) and predictive text flubs, I still can't understand how this happened.

oli56791y ago

I was thinking of censoring, maybe I should have said another word like floored.

The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.

https://en.wikipedia.org/wiki/Censoring_(statistics)

1 more reply

CaptainFever1y ago

I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.

(Fully dictated, no edits except for this)

1024core1y ago

I also thought that the commenter spoke "centered" and the speech recognition model output "censored".

kelnos1y ago· 4 in thread

I don't get the conclusion the author is trying to draw. If you look at the data presented, it seems that the model was actually pretty bad at guessing the real-world behavior of the posts listed. Out of the top ten it picked:

* 1 had a score that was reasonably close (8.4%) to what the model predicted

* 4 had scores wildly lower than the model predicted

* 2 had scores wildly higher than the model predicted

* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)

Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.

The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.

kcorbittOP1y ago

This is a fair point. The reason why I think "correlation" is a better metric than "predicts the exact correct score" is because of how I'll be using this model in the next post.

Broadly, the main use case for this model (in the RL context) will be to take two different versions of the same post, and predict which of the two is more likely to be upvoted. So what matters isn't that it gets the exact number of upvotes correctly, but that it correctly predicts the relative difference in likely upvote count between two variants.

Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.

espadrine1y ago

That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.

manx1y ago

Scores are not a good metric to be compared. I did some data analysis and wrote about it here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...

nl1y ago

The score divergence is likely because if a story makes the front page then it almost certainly gets comments and each comment adds one to the score.

But the number of comments depends on the time posted more than the story itself and that information isn't in the model.

Havoc1y ago· 4 in thread

Nice write up.

Did you ever figure out what happened in 2016?

kcorbittOP1y ago

Nope. I was actually planning on asking dang if he has any insights there. If he sees this thread hopefully he can chime in!

n2d41y ago

Given that Google Trends doesn't show that bump, I'd assume it has to do with how the data was collected. Maybe all stories with < X votes/comments older than 2015 are not included, or deleted from whatever index you used?

kelnos1y ago

In case he doesn't, you might as well email him about it. He's a very responsive guy and might find it interesting.

twoodfin1y ago

I think text vs. link used to be XOR, but isn’t any longer.

It’s still outside the hn mainstream to use both in the same submission, so that might be biasing the model in strange ways.

1 more reply

youoy1y ago· 3 in thread

Thanks for sharing! Very interesting.

> The correlation is actually not bad (0.53), but our model is very consistently over-estimating the score at the low end, and underestimating it at the high end. This is surprising; some variation on any given data point is expected, but such a consistent mis-estimation trend isn’t what we’d expect.

This is a consequence on the model objective. If you don't know what is really happening, a good way of reducing the overall error is to do that. If you instead try to exactly predict the very highs and very lows, you can see that you will get very high errors on those, resulting in a bigger overall error.

Appart from that, I want to comment on AI alignment here. For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms. It's the middle range what I really like. So be careful implementing this algorithm at scale, it could turn the website into another platform with shitty AI recommendations.

kcorbittOP1y ago

> For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms.

Yes, this is a fantastic point. I'm curious if there's some other measurable proxy metric for "things I get the most value out of on HN"? Upvotes seems like the most natural but optimizing for it too strongly would definitely take HN down a dark path.

losteric1y ago

Perhaps selecting for posts with the highest quality reply engagement? If many different people were drawn to lengthy discussions, that suggests the content sparks thoughts that others then feel compelled to engage with. Or select for the emotional content of replies, awe/empathy/anger, depending on what one wants from HN?

2 more replies

coolcoder6131y ago

Perhaps number of comments, or number of non-flamewar comments, or proportion of flamewar comments together with number of comments?

6gvONxR4sf7o1y ago· 3 in thread

Why use RL for this instead of plain old supervised learning?

dinobones1y ago

I am trying to understand this too.

Supervised learning you train on pairs of (x, y) where x is your input (title/post text/metadata) and y is the output score.

Naively, it's a linear regression model, Y = b0 + b1x1 + b2x2 + b3x3. Where b0 is your bias ("a floor for score points"), and b1, b2, and b3 are bias terms for the actual data of the post. You can solve this, closed form, and find the b1/b2/b3 that minimize the error of fitting to Y.

How do these equations change with RL? I always assumed RL was a multi-step process where actions are taken to get to a reward. If there is only 1 step/decision, to produce a "random" score, it feels much like supervised learning.

jampekka1y ago

The post is not doing RL. It's just regression as you thought.

1 more reply

jampekka1y ago

It is just plain old supervised learning. A regression from the post features to vote count. The RL discussion in TFA is a bit confusing.

Such a model can be used as the "reward model" for the "reinforcement learning from human feedback" (RLHF) method.

pclmulqdq1y ago· 2 in thread

There is a timing factor that you need to consider, too. Anecdotally, Sunday morning is the best time to get onto the front page, while Tuesday or Wednesday morning gets you the most views.

kcorbittOP1y ago

Yep, that's why I included the post date in the information available to the model; in theory (if it's smart enough) it should be able to take that into account. That said I didn't include time-of-day; it would be interesting to see whether adding that information would be able to make the model more accurate!

If the reward model is indeed smart enough to be able to take that into account you could actually use it to plan the optimal time of day to post a specific story! You could just use the reward model to compute a predicted score for 8 different versions of your content, holding the post title/text constant across them all and just changing the date. Based on the differences in scores, you can determine which posting time the RM thinks is most likely to make your post successful!

pixl971y ago

>you could actually use it to plan the optimal time of day to post a specific story!

You see this on Reddit pretty commonly.

Someone posts original content at an off time and get a small/moderate amount of upvotes. Then some time later (could be hours, days, or weeks) a bot/karma account will post the content at an optimal time to farm upvotes.

1024core1y ago· 2 in thread

Is it my understanding that the reward model is also similar to an LLM (with the difference being it predicts a score instead of the next token)?

kcorbittOP1y ago

Yes! The architecture is almost identical. The only difference is in the final layer. In an LLM used for text generation, the final layer has a separate output for every potential token the model could produce, and we decide which token to generate by choosing the one with the highest likelihood at each generation step (at least that's what the simplest sampling methods do). In an LLM used as a reward model, we only have one output in the final layer, and we interpret its value as the predicted reward.

Everything else in the model before that final layer is exactly identical, architecture-wise.

1024core1y ago

But a typical LLM has a feedback loop: it looks at the last token it generated and then decides, given the N tokens before that, which token to output next.

In the case of a reward model, are you streaming in the list of tokens; if so, what is the output after each token? Or are you feeding in all of the tokens in one shot, with the predicted reward as the output?

1 more reply

eugenekolo1y ago· 2 in thread

What does the model say about this post?

kcorbittOP1y ago

Haha great question. Since it's only trained on on-platform HN content and not external links, this post is a little bit out of distribution for it unfortunately. I'm thinking about scraping a corpus of external links and running the same analysis though, in which case I'd definitely run it on this story because I'm also curious about that. :)

Rick761y ago

I would be very interested in the results of that as well

kcorbittOP1y ago· 1 in thread

Hey all, this project was a labor of love I worked on in my spare time over the last couple of weeks. Happy to answer any questions!

Eisenstein1y ago

I think it is interesting, but I can't help but feel that things like this result in the homogenizing and blandefying of content. It is like training a model to predict what movies will be successful at the box office -- the result will be the same kinds of movies over and over. No one knows what the breakthrough success is until it shows up, and no model can predict those. Essentially this is teaching people how to make HN full of nothing but complaints and indie success stories.

What is your take on this?

suyash1y ago· 1 in thread

Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?

kcorbittOP1y ago

I link to it from the post, but all the code is open source! You can find the specific training script here: https://github.com/OpenPipe/best-hn/blob/main/stories_train_...

And all the graphs for the blog are from this notebook: https://github.com/OpenPipe/best-hn/blob/main/blog-figures.i...

Lots of other good stuff in that repo, although it's only organized to a "working researcher" standard I'm afraid.

swyx1y ago

> > This query took 17 seconds to load the dataset into RAM and then aggregating by type was almost instant. It is absolutely incredible to me that I can load every HN post and comment ever into RAM in a few seconds on my (admittedly beefy) dev laptop, and analyze them at will. What an age of abundance!

https://motherduck.com/blog/big-data-is-dead/

Arctic_fly1y ago

> But in 2015 there is a stark discontinuity, where the number of stories (with text) shoots up by >10x, and the average score drops by 5x! Is this some kind of eternal September?

Based on the later analysis in the post (which I agree with), the total score of a comment is disproportionately tied to whether it hits the front page, and of course how long it stays there. Regardless of the quality of the average post starting in 2015, the sheer quantity would make it impossible for all but a few to stay on the front page for very long. Hacker News got more popular, so each story got less prime time.

manx1y ago

Very interesting! Identifying great new content is a big unsolved problem for HN IMHO. Unfortunately, scores are not a good metric to predict, because they are not comparable (see https://felx.me/2021/08/29/improving-the-hacker-news-ranking...). A better metric might be "upvoterate", defined as how much more or less likely users are to upvote a story compared to the average story. More about that here: https://github.com/social-protocols/quality-news?tab=readme-...

Nevermark1y ago

> It’s super important that your training inputs includes all the information your model will need to make predictions. In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.

You would do better to leave out dates and authors.

Do you really want the model to hone in on dates & authors? If you just trained on those would it create anything useful?

It can’t for dates, since it isn’t getting any future date examples to prepare for future dates. I suppose you could argue that month & day matter. But surely that would be a much lower quality discriminator than forcing the model to stay focused on title & content.

Similarly with author. You can find out which authors produce content with the most upvotes with a simple calculation.

But again, is that the discriminator you want the model to use? Or the title & content? Because it will use the easiest discriminator it can.

gavin_gee1y ago

Take note HN, this is what great content marketing looks like.

hnburnsy1y ago

Suggestion would be to try and coorolate the best time to post on HN to get it noticed. A good post won't catch fire if it doesn't overcome the initial low visibility. I've posted items that are later posted by others that gain traction.

Maybe the reputation of the poster is also a factor?

metalman1y ago

now do it again, and this time see where your post on ranking posts,ranks Personaly,I find lauding the dead, and dead past to be some how objectionable. Though I suppose that it is the business of our so called Ai, mining the dead past, hoping to come up with something better than frankenstien's zombie corpse. It is an insurmountable limitation, and dangerous I think as well, the past is that ultimatly perfect thing, its absolute imutability, and totality, as it is all there, to pick and choose from such a thing is brazen indeed. I cant help but imagine a picture of your $4.80 actualy bieng consumed in a bed of fluidised coal, which in fact it was.

hn_throwaway_991y ago

> And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!

Well, thanks HN, you were good while it lasted...

octocop1y ago

Even the AI's don't read the content before up/down voting.

floobertoober1y ago

Maybe it would help to use a box cox transform on the score distribution?

chx1y ago

> . That’s not much time for a model that (hopefully) understands all of HN!

this is dangerous talk.

it doesn't understand anything at all.

Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.

ChrisArchitect1y ago

First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.

ivanovm1y ago

this is very cool, have you tried DPO?

j / k navigate · click thread line to collapse

95 comments

72 comments · 25 top-level

jerjerjer1y ago· 11 in thread

> In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.

In addition to date, you might want to include three fields:

- day of week (categorical)

- is weekend/holiday (boolean)

- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).

The probability of a post hitting the front page is usually affected by these things so it can really help the model.

sitkack1y ago

I find that the best stories get posted by folks in EU time zones as well as the weekend (more of hacker ethos). The flame bait startup drama is M-F Pacific.

jedberg1y ago

I haven't run the data, but anecdotally I can tell you that those things probably don't affect hitting the front page. They do affect the total score, but that is not what is being optimized here.

In the end, it kinda evens out. The number of votes it takes to get to the front page and the number of competing submissions are both correlated to your fields above.

floobertoober1y ago

I think that this assumes a uniform distribution of "interestingness" in the competing posts across all of those dimensions and I wouldn't be surprised if that isn't the case

1 more reply

4m1rk1y ago

Popular time for voting vs posting are not the same

josefx1y ago

> is weekend/holiday

EffrafaxOfWug1y ago

For anyone interested, it was this CCC talk by David Kriesel (sadly german only).

https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...

1 more reply

maaaaattttt1y ago

kcorbittOP1y ago

Yep that makes sense. Would be interesting to do a follow-up that explicitly includes these variables and see if it meaningfully improves the results.

fennecbutt1y ago

rajnathani1y ago

I would replace author with a boolean of if the author's account is new or not (the green marker that HN has for new users' posts and comments).

aaron6951y ago

> might want to include three fields:

This has been studied multiple times on HN posts, most seem to have link-rotted. Web Archive them if looking for insights - https://hn.algolia.com/?q=best+time+to+post

sdflhasjd1y ago· 9 in thread

It's interesting that service complaints are so popular on HN. I always feel a bit bad that my most popular HN contribution was me complaining about a popular service

kelnos1y ago

I flag most complaint posts, unless the complaint actually brings to light or discusses something surprising or unique that can be generalized and discussed.

Karrot_Kream1y ago

Model correlation is decent here but there's certainly more to do to use its outputs predictively.

johnfn1y ago

miki1232111y ago

As a mastodon user, I can definitely confirm this.

Give people the way to repost / retweet / boost, and your feed suddenly turns into mostly negativity, even if your algorithm is "show posts from my followers only, newest to oldest"

1 more reply

Vampiero1y ago

If that theory were true then, what about every website on the internet pre-2010? What about 4chan?

We're just built like that.

1 more reply

int_19h1y ago

This video will make you angry: https://www.youtube.com/watch?v=rE3j_RHkqJc

jerjerjer1y ago

Humans love having something to be righteously indignant about.

Rick761y ago

I'm sure it's just human psyche but I'm trying to overcome it and make my life more positive again

andrewmcwatters1y ago

oli56791y ago· 5 in thread

If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.

https://scikit-learn.org/dev/modules/generated/sklearn.isoto...

kcorbittOP1y ago

I hadn't heard of isotonicregression before but I like it!

> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.

Y_Y1y ago

oli56791y ago

I was thinking of censoring, maybe I should have said another word like floored.

The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.

https://en.wikipedia.org/wiki/Censoring_(statistics)

1 more reply

CaptainFever1y ago

I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.

(Fully dictated, no edits except for this)

1024core1y ago

I also thought that the commenter spoke "centered" and the speech recognition model output "censored".

kelnos1y ago· 4 in thread

* 1 had a score that was reasonably close (8.4%) to what the model predicted

* 4 had scores wildly lower than the model predicted

* 2 had scores wildly higher than the model predicted

* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)

Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.

The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.

kcorbittOP1y ago

This is a fair point. The reason why I think "correlation" is a better metric than "predicts the exact correct score" is because of how I'll be using this model in the next post.

Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.

espadrine1y ago

manx1y ago

Scores are not a good metric to be compared. I did some data analysis and wrote about it here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...

nl1y ago

The score divergence is likely because if a story makes the front page then it almost certainly gets comments and each comment adds one to the score.

But the number of comments depends on the time posted more than the story itself and that information isn't in the model.

Havoc1y ago· 4 in thread

Nice write up.

Did you ever figure out what happened in 2016?

kcorbittOP1y ago

Nope. I was actually planning on asking dang if he has any insights there. If he sees this thread hopefully he can chime in!

n2d41y ago

kelnos1y ago

In case he doesn't, you might as well email him about it. He's a very responsive guy and might find it interesting.

twoodfin1y ago

I think text vs. link used to be XOR, but isn’t any longer.

It’s still outside the hn mainstream to use both in the same submission, so that might be biasing the model in strange ways.

1 more reply

youoy1y ago· 3 in thread

Thanks for sharing! Very interesting.

kcorbittOP1y ago

> For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms.

losteric1y ago

2 more replies

coolcoder6131y ago

Perhaps number of comments, or number of non-flamewar comments, or proportion of flamewar comments together with number of comments?

6gvONxR4sf7o1y ago· 3 in thread

Why use RL for this instead of plain old supervised learning?

dinobones1y ago

I am trying to understand this too.

Supervised learning you train on pairs of (x, y) where x is your input (title/post text/metadata) and y is the output score.

jampekka1y ago

The post is not doing RL. It's just regression as you thought.

1 more reply

jampekka1y ago

It is just plain old supervised learning. A regression from the post features to vote count. The RL discussion in TFA is a bit confusing.

Such a model can be used as the "reward model" for the "reinforcement learning from human feedback" (RLHF) method.

pclmulqdq1y ago· 2 in thread

There is a timing factor that you need to consider, too. Anecdotally, Sunday morning is the best time to get onto the front page, while Tuesday or Wednesday morning gets you the most views.

kcorbittOP1y ago

pixl971y ago

>you could actually use it to plan the optimal time of day to post a specific story!

You see this on Reddit pretty commonly.

1024core1y ago· 2 in thread

Is it my understanding that the reward model is also similar to an LLM (with the difference being it predicts a score instead of the next token)?

kcorbittOP1y ago

Everything else in the model before that final layer is exactly identical, architecture-wise.

1024core1y ago

But a typical LLM has a feedback loop: it looks at the last token it generated and then decides, given the N tokens before that, which token to output next.

1 more reply

eugenekolo1y ago· 2 in thread

What does the model say about this post?

kcorbittOP1y ago

Rick761y ago

I would be very interested in the results of that as well

kcorbittOP1y ago· 1 in thread

Hey all, this project was a labor of love I worked on in my spare time over the last couple of weeks. Happy to answer any questions!

Eisenstein1y ago

What is your take on this?

suyash1y ago· 1 in thread

Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?

kcorbittOP1y ago

I link to it from the post, but all the code is open source! You can find the specific training script here: https://github.com/OpenPipe/best-hn/blob/main/stories_train_...

And all the graphs for the blog are from this notebook: https://github.com/OpenPipe/best-hn/blob/main/blog-figures.i...

Lots of other good stuff in that repo, although it's only organized to a "working researcher" standard I'm afraid.

swyx1y ago

https://motherduck.com/blog/big-data-is-dead/

Arctic_fly1y ago

> But in 2015 there is a stark discontinuity, where the number of stories (with text) shoots up by >10x, and the average score drops by 5x! Is this some kind of eternal September?

manx1y ago

Nevermark1y ago

You would do better to leave out dates and authors.

Do you really want the model to hone in on dates & authors? If you just trained on those would it create anything useful?

Similarly with author. You can find out which authors produce content with the most upvotes with a simple calculation.

But again, is that the discriminator you want the model to use? Or the title & content? Because it will use the easiest discriminator it can.

gavin_gee1y ago

Take note HN, this is what great content marketing looks like.

hnburnsy1y ago

Maybe the reputation of the poster is also a factor?

metalman1y ago

hn_throwaway_991y ago

> And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!

Well, thanks HN, you were good while it lasted...

octocop1y ago

Even the AI's don't read the content before up/down voting.

floobertoober1y ago

Maybe it would help to use a box cox transform on the score distribution?

chx1y ago

> . That’s not much time for a model that (hopefully) understands all of HN!

this is dangerous talk.

it doesn't understand anything at all.

Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.

ChrisArchitect1y ago

ivanovm1y ago

this is very cool, have you tried DPO?

j / k navigate · click thread line to collapse