undefined | Better HN

0 pointsmach1ne3y ago0 comments

> It also isn't generating "the most likely response" - that's what original GPT-3 did, GPT-3.5 and up don't work that way.

What changed?

0 comments

astrange3y ago

It answers questions in a voice that isn't yours.

The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".

afiori3y ago

I believe that ChatGPT works by inserting some ANSWER_TOKEN, that is a prompt like "Tell me about cats" would probably produce "Tell me about cats because I like them a lot", but the interface wraps you prompt like "QUESTOION_TOKENL:Tell me about cats ANSWER_TOKEN:"

astrange3y ago

It might, but I've used text-davinci-003 before this (https://platform.openai.com/playground) and it really just works with whatever you give it.

mort963y ago

text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)

1 more reply

afiori3y ago

meaning that it tends to continue your question?

meow_mix3y ago

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

mistymountains3y ago

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.

j / k navigate · click thread line to collapse

0 comments

astrange3y ago

It answers questions in a voice that isn't yours.

The "most likely response" to text you wrote is: more text you wrote. Anytime the model provides an output you yourself wouldn't write, it isn't "the most likely response".

afiori3y ago

astrange3y ago

It might, but I've used text-davinci-003 before this (https://platform.openai.com/playground) and it really just works with whatever you give it.

mort963y ago

text-davinci-003 has no trouble working as a chat bot: https://i.imgur.com/lCUcdm9.png (note that the poem lines it gave me should've been green, I don't know why they lost their highlight color)

1 more reply

afiori3y ago

meaning that it tends to continue your question?

meow_mix3y ago

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

mistymountains3y ago

j / k navigate · click thread line to collapse