undefined | Better HN

0 pointsXelynega1y ago0 comments

I don't understand what you are saying.

How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?

0 comments

Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.

j / k navigate · click thread line to collapse

0 pointsXelynega1y ago0 comments

I don't understand what you are saying.

How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?

Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.

j / k navigate · click thread line to collapse