Unpacking the HF in RLHF (opens in new tab)

(maestroai.substack.com)

4 pointsjbcranshaw3y ago3 comments

3 comments

3 comments · 1 top-level

jbcranshawOP3y ago· 2 in thread

Some observations on a few ways different people actually gather feedback from humans in practice to improve LLMs. Sure I've missed some here, so let me know.

PaulHoule3y ago

I think tying the feedback to a task is the way to go. I don't know exactly what they do with it but my son is a fan of Craiyon and Craiyon shows you a few images it generated and encourages you to favorite the best ones. I'm sure you could RLHF an image generator too.

See my comment on this thread

https://news.ycombinator.com/item?id=35069965

I'd be glad to chat more (see my profile) but I think as much as people think there is a scalability advantage to a big co training one big model the problems of pleasing everybody, particularly advertisers, are terrible but a personal model that pleases one person might be easy with recent tech.

jbcranshawOP3y ago

Hey, thanks for the reply. Would love to chat more.

j / k navigate · click thread line to collapse