Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
RLHF: Reinforcement Learning from Human Feedback | Better HN
RLHF: Reinforcement Learning from Human Feedback
(opens in new tab)
(huyenchip.com)
4 points
madisonmay
2y ago
1 comments
Share
1 comments
default
newest
oldest
heliophobicdude
2y ago
This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?
j
/
k
navigate · click thread line to collapse