undefined | Better HN

0 pointsjampekka1y ago0 comments

The post is not doing RL. It's just regression as you thought.

0 comments

2 comments · 1 top-level

billmalarky1y ago· 1 in thread

This post is using regression to build a reward model. The reward model will then be used (in a future post) to build the overall RL system.

Here's the relevant text from the article:

>In this post we’ll discuss how to build a reward model that can predict the upvote count that a specific HN story will get. And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!

jampekkaOP1y ago

The title is misleading. The $4.80 is spent for supervised learning to find the best post.

The post is interesting and I'll be sure to check out the next parts too. It's just that people, as evidenced by this thread, clearly misunderstood or were what was done.

j / k navigate · click thread line to collapse